Auto triggered workflow - DataWorks - Alibaba Cloud Documentation Center

DataWorks provides auto triggered workflows where nodes are scheduled to run on a regular basis. This topic describes how to use an auto triggered workflow.

Background information

Workflows are a type of automated management tool for data processing processes. You can drag different types of nodes on the configuration tab of a workflow to easily configure scheduling dependencies for the nodes. This helps accelerate the construction of a data processing process and effectively improve task development efficiency.

Supported scheduling dependencies

The configuration of scheduling dependencies for an auto triggered workflow is much like that for a common node. Nodes in an auto triggered workflow or an auto triggered workflow as a whole can depend on or be depended on by other nodes. The following dependencies are supported:

A workflow as a whole can be depended on by other independent tasks or workflows.
A workflow as a whole can depend on other independent tasks or workflows.
Tasks in an auto triggered workflow can depend on other independent tasks or workflows.
Tasks in an auto triggered workflow can be depended on by other independent tasks or workflows.

Scheduling dependencies

Status changes of auto triggered workflows in the running process

You can specify a scheduling time for an auto triggered workflow. The running of nodes in the auto triggered workflow is affected by the scheduling time. If a node depends on the auto triggered workflow, the running of the node is affected by the scheduling time of the auto triggered workflow. In scheduling scenarios, the status of an auto triggered workflow is affected by the status of tasks in the workflow.

Auto triggered workflow status changes

Status of an auto triggered instance:

Not Run
Waiting for Scheduling Time to Arrive
Running
Successful
Failed
Frozen

Workflow图片

Special scenarios:

If an instance is frozen or suspended in a workflow, the entire workflow instance enters the failed status.
If a data backfill instance generated for a task in a workflow is frozen, the workflow instance enters the successful status.
If a task cannot be run in data backfill scenarios, the workflow to which the task belongs enters the failed status.
A time difference exists between the time when the instance status changes and the actual time when a failure event is generated.
If a merge node exists in a workflow, ancestor nodes of the merge node may fail. In this case, you can check whether the workflow is in the successful status based on whether the merge node is in the successful status.

Execution time and parameter replacement for nodes in an auto triggered workflow

You do not need to configure a scheduling cycle for nodes in a workflow. You need to only configure the delayed execution time for nodes in a workflow. The delayed execution time indicates the amount of time for which a node in a workflow is delayed from the scheduling time of the workflow.
The actual running time of a node in a workflow is calculated based on the delayed execution time configured for the node and the scheduling time configured for the workflow.
The value assignment of scheduling parameters for nodes in a workflow is determined based on the overall scheduling time of the workflow, but not the time after a delay.

Precautions

Only new-version Data Studio supports auto triggered workflows.
When creating a workflow, select Periodic Scheduling as the scheduling type.
A referenceable workflow cannot depend on another workflow or be automatically scheduled after it is deployed to the production environment.
Important
- A referenceable workflow cannot depend on or be dependent on by another workflow. The nodes in the workflow cannot depend on other nodes that do not belong to the workflow or be dependent on by other nodes, such as a root node of a workspace. Otherwise, when you deploy the referenceable workflow, an error is reported, and the workflow cannot be deployed as expected.
- After a referenceable workflow is deployed to the production environment, the workflow is not automatically scheduled until it is referenced by another workflow by using a SUB_PROCESS node.

Go to the Create workflow page

Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.
In the left-side navigation pane of the Data Studio page, click the icon. In the Workspace Directories section of the DATA STUDIO pane, click the icon on the right side and select Create Workflow.
Note
The first time you perform operations in the Workspace Directories section of the DATA STUDIO pane, you can directly click Create Workflow to create a workflow.

Create an auto triggered workflow

On the Create Workflow page, select Periodic Scheduling as the scheduling type.
Note
- Auto triggered workflows allow you to define how frequently tasks are triggered. You can specify a schedule to automatically run tasks at designated times.
- To create a trigger-based workflow, see Trigger-based workflow.
Enter a workflow name and click OK to create it.

Design the auto triggered workflow

On the left side of the configuration tab of the workflow, select node types based on the types of tasks that you want to develop, and drag the node types to the canvas on the right. Then, manually drag lines to configure scheduling dependencies between the nodes.

Note

DataWorks encapsulates the capabilities of different compute engines in different types of nodes. You can use nodes of different compute engine types to develop data in a visualized manner, without the need to run complex commands on compute engines. You can also use the general nodes of DataWorks to design complex logic.
A single workflow can contain up to 400 nodes. For better readability and maintainability, we recommend keeping the number of nodes under 100.

Develop the auto triggered workflow

You can perform the following steps to develop the workflow. You can also configure references between workflows and transparently pass the workflow parameters to the parameters of the nodes in the workflow. For more information, see Appendix: Workflow reference.

Go to the configuration tab of a node in the workflow.
On the configuration tab of the workflow, move the pointer over a desired node and click Open Node to go to the configuration tab of the node.
Develop the node.
On the configuration tab of the node, edit node code. Take note of the following items during code development:
- The code syntax depends on the node type that you selected. Different types of tasks differ in scheduling configuration. For more information, see Node development.
- You can enable the intelligent programming assistant Copilot to obtain intelligent code completion suggestions and improve development efficiency.
- For most node types, you can define variables in the ${Variable name} format. This way, you can assign different scheduling parameters to the variables as values to facilitate task code debugging.
- When you run a scheduling node in a workflow, you can define workflow parameters in the format of ${workflow.parameter name} in the code editor of the node to obtain the values of the workflow parameters.
Debug and run the node.
1. Configure debugging parameters. After you edit the code, you can click the Debugging Configurations tab in the right-side navigation pane of the configuration tab of the node to configure debugging parameters.
  1. In the Computing Resource section, specify a computing resource for tasks on the node to be debugged.
  2. In the DataWorks Configurations section, specify a resource group that is used to run tasks in DataWorks.
  3. If you defined variables in the ${Variable name} format in the node code, assign constants to the variables in the Script Parameters section.
2. Debug and run the node. After the configuration is complete, click Run in the top toolbar of the configuration tab of the node to run the node based on the debugging parameters configured on the Debugging Configurations tab.

Debug the auto triggered workflow

After the debugging is complete, debug the workflow.

Click the icon in the top toolbar of the configuration tab of the workflow. In the Enter runtime parameters dialog box, configure the Value Used in This Run parameter and click OK.
Click the nodes on the configuration tab of the workflow to view execution results.

Deploy the auto triggered workflow

You can refer to the following procedure to define scheduling settings for a workflow and nodes in the workflow, and deploy the workflow to the production environment for periodic scheduling.

Configure scheduling settings for nodes in the workflow.
The procedure of configuring scheduling settings for nodes in a workflow is basically the same as that for common nodes. For more information, see Scheduling dependencies. Take note of the following items when you configure scheduling settings for nodes in a workflow:
- You do not need to separately configure a scheduling time for nodes in a workflow. Instead, you can configure the delayed execution time for nodes in a workflow. The delayed execution time indicates the duration by which the running of a node lags behind the running of the related workflow.
- The values of variables used in code of nodes in a workflow are assigned based on the scheduling time of the workflow.
Configure scheduling settings for the workflow.
Configure scheduling parameters, scheduling time, and scheduling dependencies for the workflow.
Deploy the workflow.
Click the icon in the top toolbar of the configuration tab of the workflow. On the DEPLOY tab, click Start Deployment to Production Environment. The workflow is deployed based on the check and deployment process. For more information, see Node or workflow deployment.

What to do next: Workflow O&M

After an auto triggered workflow is deployed, the auto triggered workflow is scheduled on a regular basis. You can view the status of the auto triggered workflow in Operation Center in the production environment and perform O&M operations on the auto triggered workflow. For more information, see Overview and Backfill data and view data backfill instances (new version).

Appendix: Workflow reference

Scenario 1: Reference a workflow

You can use a SUB_PROCESS node in a workflow to reference another workflow. To enable a workflow to be referenceable, perform the following steps: In the right-side navigation pane of the configuration tab of the workflow, click Property. On the tab that appears, turn on Referencable.

Important

After a referenceable workflow is deployed to the production environment, the workflow is not automatically scheduled until it is referenced by another workflow by using a SUB_PROCESS node.

The scheduling time of the referenceable workflow depends on the scheduling time of the workflow that references the referenceable workflow.