Nodes and workflows in the project directory often require recurring scheduling. You can configure scheduling properties, such as the scheduling period, dependencies, and parameters, on the scheduling configuration panel for a node or workflow. This topic describes how to configure scheduling properties.
Prerequisites
A node must be created. In DataWorks, tasks are developed based on nodes. Tasks for different engine types are encapsulated as different node types. You can select a node type based on your requirements. For more information, see Node development.
The periodic scheduling switch must be turned on. Tasks in a DataWorks workspace are automatically scheduled based on their configurations only if the Enable Periodic Scheduling switch is turned on. You can turn on this switch on the Scheduling Settings page for the workspace. For more information, see System Settings.
Precautions
The scheduling configurations of a task only define its properties at runtime. The task is scheduled based on these configurations only after it is published to the production environment.
The scheduling time specifies only the expected running time of a task. The actual running time depends on the running status of the ancestor nodes. For more information about the conditions for running a task, see Diagnose a running task.
DataWorks lets you create dependencies between different types of tasks. Before you proceed, we recommend that you read the Principles and examples of scheduling configurations for complex dependencies document to understand the preset dependencies in DataWorks for this scenario.
In DataWorks, a recurring instance is generated for a scheduling node based on the scheduling type and period that you specify. For example, if you configure a node to run hourly, a corresponding number of hourly instances are generated for the node each day. The node runs automatically using these recurring instances. For more information, see View recurring instances.
If you use scheduling parameters, the request parameters in the code for each cycle of a DataWorks scheduling node are determined by the scheduled time of the cycle and the scheduling parameter expression that you specify. For more information about how request parameters relate to the configuration and replacement of scheduling parameters, see Supported formats of scheduling parameters.
A workflow includes the workflow node and inner nodes. Their dependencies are complex. This topic describes only the dependencies and scheduling of individual nodes. For more information about the scheduling dependencies of a workflow, see Recurring workflow.
Go to the scheduling configuration page
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Go to the scheduling configuration page.
On the DataStudio page, find the node and open its configuration tab.
In the right-side navigation bar of the node configuration tab, click Scheduling Configuration to open the node scheduling configuration page.
Configure scheduling properties for a node
On the scheduling configuration page of a node, you must configure Scheduling Parameters, Scheduling Policy, Scheduling Time, Scheduling Dependencies, and Node Output Parameters for the node.
(Optional) Scheduling parameters
If you define variables in the code when you edit the node, you must assign values to the variables in this section.
Scheduling parameters are automatically replaced with specific values based on the business time of the scheduled task and the value format of the scheduling parameters. This allows for the dynamic replacement of parameters within the scheduling time of the task.
Configure scheduling parameters
You can define scheduling parameters in one of the following ways.
Method | Description | Configuration diagram |
Add a parameter | You can configure multiple scheduling parameters for a scheduling node. If you want to use multiple scheduling parameters, you can click Add Parameter.
|
|
Load parameters from code | This method is used to automatically detect the variable names that are defined in the code of the current node and add the detected variable names as scheduling parameters for subsequent use. Note In most cases, a variable name is defined in the The method for defining variable names for PyODPS and general Shell nodes is different from that for other types of nodes. For more information about the formats of scheduling parameters for different types of nodes, see Examples of scheduling parameter configurations for different types of nodes. |
|
Supported formats of scheduling parameters
For more information, see Supported formats of scheduling parameters.
Check the scheduling parameter configurations of the task in the production environment
To prevent issues caused by unexpected scheduling parameters when a recurring task runs, we recommend that you go to the Auto Triggered Task page in Operation Center to check the scheduling parameter configurations for the recurring task in the production environment after the task is published. For more information about how to view a recurring task, see Manage auto triggered tasks.
Scheduling policy
The scheduling policy defines the instance generation mode, scheduling type, computing resources, and resource groups for a recurring task.
Parameter | Description |
Instance generation mode | After a node is submitted and published to the CDN mapping system in the production environment, the platform generates Recurring Instances for automatic scheduling based on the Instance Generation Mode configured for the node.
|
Scheduling type |
|
Timeout period | If you set a timeout period, the task automatically stops running if its running time exceeds the specified timeout period. The configuration instructions are as follows:
|
Rerun property | Configure the node to be rerun in specific situations. The rerun property cannot be empty. The supported types and their application scenarios are as follows:
|
Automatic rerun upon failure | If you enable this feature, when a task fails to run (excluding cases where the user actively stops the task), the CDN mapping system automatically triggers a rerun based on the number of retries and the retry interval.
Note
|
Computing resource | Configure the DPI engine resources required for the task to run. To create new resources, you can do so through computing resource management. |
Computing quota | You can configure the computing quota required for a task to run in a MaxCompute SQL node or a MaxCompute Script node to provide computing resources (CPU and memory) for the computing job. |
Schedule resource group | Configure the schedule resource group used for the task to run. Select as needed.
|
Dataset | Click
|
Scheduling time
The scheduling time is used to configure the period, time, and other information for the automatic execution of a scheduling node.
If the node is in a workflow, parameters related to Scheduling Time are set in the Scheduling Configuration on the workflow page. If the node is not in a workflow, the Scheduling Time is set in the Scheduling Configuration for each node.
Precautions
The scheduling frequency of a task is independent of the scheduling period of its ancestor tasks
A task's scheduling frequency depends on its own scheduling period, not the scheduling period of its ancestor tasks.
DataWorks supports dependencies between tasks with different scheduling periods
In DataWorks, a recurring instance is generated for a scheduling node based on the scheduling type and period that you specify. For example, if you configure a node to run hourly, a corresponding number of hourly instances are generated for the node each day. The node runs using these instances. Dependencies set for a recurring task are essentially dependencies between the instances that the tasks generate. If the scheduling types of ancestor and descendant nodes are different, the number of recurring instances generated and their dependencies will also be different. For more information about dependencies between ancestor and descendant nodes with different scheduling periods, see Select a dependency type (cross-cycle dependency).
Tasks that are not scheduled on a daily basis perform a dry-run
In DataWorks, tasks that are not scheduled daily, such as weekly or monthly tasks, perform a dry-run outside their scheduled times. When the task's scheduled time is reached, it immediately returns a successful status. If a daily scheduling task exists downstream, it triggers the downstream task to execute. In this case, the ancestor node performs a dry-run, and the descendant scheduling node executes as scheduled.
Task running time description
This setting only defines the expected scheduling time for the task. The actual running time of the task is affected by multiple factors, such as the scheduled time of the ancestor node, the availability of task execution resources, and the task's actual running conditions. For more information, see Conditions for running a task.
Configure the scheduling time
Parameter | Description |
Scheduling period | The scheduling period is the period for a task to be automatically run in a scheduling scenario. It is used to define how often the code logic in a node is actually run in the CDN mapping system of the production environment. A recurring instance is generated for a scheduling node based on the scheduling type and period that you specify. For example, if you configure a node to be run on an hourly basis, the specified number of hourly instances are generated for the node every day. The recurring task is automatically run using the recurring instances.
Important For weekly, monthly, and yearly scheduling, instances are still generated daily during non-scheduling times. The instances show a successful state, but they will actually perform a dry-run and will not actually run the task. |
Effective date | The scheduling node takes effect and is automatically scheduled within the effective date range. Tasks that exceed the effective date will not be automatically scheduled. These tasks are expired tasks. You can view the number of expired tasks on the O&M dashboard and unpublish them as needed. |
Cron expression | This expression is automatically generated based on the time property configuration and does not need to be configured. |
Scheduling dependencies
The scheduling dependencies of a task in DataWorks refer to the Directed Acyclic Graph (DAG) between nodes in a scheduling scenario. A descendant node task starts to run only after its ancestor node tasks run successfully. Configuring scheduling dependencies ensures that the scheduling task can obtain the correct data when it runs. After an ancestor node runs successfully, DataWorks detects that the latest data for the ancestor table is generated, which allows the descendant node to retrieve the data. This prevents the descendant node from failing to retrieve data because the ancestor table data has not been generated.
Precautions
After the node dependency is configured, a running condition for a descendant node is that all its dependent ancestor nodes have run successfully. Otherwise, the current task may encounter data quality issues when retrieving data.
The actual running time of a task depends on both its own scheduled time and the completion time of its ancestor tasks. If an ancestor task has not finished running, the descendant task will not run, even if its scheduled time is earlier than that of the ancestor task. For more information about the conditions for running a task, see Diagnose a running task.
Configure scheduling dependencies
The primary purpose of task dependencies in DataWorks is to ensure that descendant nodes can retrieve data correctly. This is essentially a data lineage dependency between ancestor and descendant tables. You can choose whether to configure scheduling dependencies based on the data lineage of the tables according to your business needs. The process for configuring node scheduling dependencies is as follows.
After a node dependency is configured, a strong dependency exists by default between the output tables of the ancestor and descendant nodes. Therefore, when you configure scheduling dependencies for a task, you must confirm if a strong data lineage dependency exists. A strong data lineage dependency exists if the data output of the descendant node depends on the data output of the ancestor node. This confirmation prevents issues where the current task cannot retrieve data because the ancestor data has not been generated.
Ordinal number | Description |
① | To prevent the current task from running at an unexpected time, you can first assess whether there is a strong dependency between the tables and confirm whether you need to configure scheduling dependencies based on data lineage. |
② | Confirm whether the current scenario involves table data generated by a recurring task. For table data not generated by a recurring schedule in DataWorks, DataWorks cannot monitor data output through task running status. Therefore, some tables do not support configuring scheduling dependencies. Tables with data not generated by a recurring schedule in DataWorks include but are not limited to the following types:
|
③④ | Choose to depend on the same cycle or the previous cycle of the ancestor node based on whether you need to depend on yesterday's or today's data from the ancestor node, and whether an hourly or minute-based task needs to depend on its own previous hour or minute instance.
Note For details on configuring scheduling dependency scenarios based on data lineage, see Select a dependency type (same-cycle dependency). |
⑤⑥⑦ | After the dependency is configured and published to the production environment, you can check whether the task dependency meets expectations in the Auto Triggered Task section of Operation Center. |
Configure custom node dependencies
If no strong data lineage dependency exists between tasks in DataWorks, or if the dependent data is not from a table generated by a recurring scheduling node, you can customize the node's dependencies. For example, a task may not strongly depend on a specific partition of the ancestor data but only retrieves data from the latest partition at the current time. Another example is when data is from a locally uploaded table. You can configure custom dependencies in the following ways:
Depend on the root node of the workspace
In scenarios where the input data for a synchronization task originates from other business databases, or where an SQL-type task processes table data generated by a real-time synchronization task, you can attach the dependency directly to the root node of the workspace.
Depend on a zero load node
If a workspace contains many or complex business processes, you can use a zero load node to manage them. You can attach the dependencies of nodes that require central management to a specific zero load node to clarify the data forwarding path in the workspace. For example, you can control the overall scheduling time of a business process or control its overall scheduling, including freezing it (disabling scheduling).
Node output parameters
After you define an output parameter and its value for an ancestor node, you can define an input parameter for a descendant node whose value references the output parameter of the ancestor node. This allows the descendant node to use this parameter to obtain the value passed from the ancestor node.
Precautions
The Output Parameter of a node is used only as an input parameter for a descendant node. You can add a parameter in the scheduling parameter section of the descendant node and associate it with the ancestor parameter by clicking
in the Actions column. Some nodes cannot directly pass the query results of an ancestor node to a descendant node. To pass the query results of an ancestor node to a descendant node, you can use an assignment node. For more information, see Assignment node.The nodes that support node output parameters are:
EMR Hive,EMR Spark SQL,ODPS Script,Hologres SQL,AnalyticDB for PostgreSQL, andMySQLnodes.
Configure node output parameters
The value of a Node Output Parameter can be a Constant or a Variable.
After you define the output parameter and submit the current node, you can select Bind The Output Parameter Of The Ancestor Node to use it as an input parameter for the descendant node when you configure scheduling parameters for the descendant node.

Parameter name: The name of the defined output parameter.
Parameter value: The value of the output parameter. The value type can be a constant or a variable:
A constant is a fixed string.
A variable can be a system-supported global variable, a built-in scheduling parameter, or a custom parameter.
References
Scheduling parameters: For more information, see Formats of scheduling parameters.
Scheduling policy:
For more information, see Generate instances immediately after publishing.
For more information, see Dry-run a task.
Scheduling time: For more information, see Scheduling time.
Scheduling dependencies:
For more information, see Cross-cycle dependencies.
For more information, see Same-cycle dependencies.
For more information, see Dependencies in complex scenarios.
For more information, see Dependencies in special scenarios.
Other references:
For more information, see Impact of daylight saving time switch on the running of scheduling tasks.


to add a