What to do if the synchronization speed is slow? How to increase or limit the offline synchronization rate? - DataWorks

This topic describes the factors that affect data synchronization speed, how to adjust concurrency to maximize synchronization speed, job throttling options, and solutions for slow data synchronization scenarios.

Overview

Data synchronization speed is affected by many factors, such as task configurations, database performance, and network conditions. For more information, see Factors that affect data synchronization speed.
Slow data synchronization can occur at different stages of the process. This topic describes solutions for slow performance at each stage. For more information, see Scenarios and solutions for slow data synchronization.
If the database performance is limited, a faster synchronization speed is not always better. A high speed may overstress the database and affect other production services. Data Integration provides throttling options that you can configure as needed. For more information, see Limit the synchronization speed.

Factors that affect data synchronization speed

Data synchronization speed is affected by factors such as the source and destination database environments and task configurations. You are primarily responsible for monitoring and tuning the performance, load, and network conditions of the source and destination databases.

The following factors affect data synchronization speed:

Factor	Description
Source data source	Database performance: CPU, memory, SSD, network, and disk performance. Concurrency: A higher concurrency on the data source results in a heavier database load. A database with better performance can handle higher concurrency. This lets you configure more concurrent data extractions for the data synchronization job. Network: The bandwidth (throughput) and speed of the network.
Resource group for scheduling used by the offline sync task	Offline sync tasks are dispatched by schedule resources to run on Data Integration execution resources. The usage of schedule resources also affects the overall synchronization efficiency.
Offline sync task configuration	Transfer speed: Whether an upper limit is set for the task synchronization speed. Concurrency: The maximum number of threads that can read from the source or write to the destination data storage in parallel. WAIT resources. The Bytes setting: A single thread has Bytes=1048576. If the network speed is sensitive, a timeout may occur. In this case, set Bytes to a smaller value. Whether an index is created for query statements.
Destination data source	Performance: CPU, memory, SSD, network, and disk performance. Load: An excessive load on the destination database affects the data write efficiency of the sync task. Network: The bandwidth (throughput) and speed of the network.

Scenarios and solutions for slow data synchronization

Note

For more information about offline sync task logs, see Analyze offline sync logs.

Scenario of slow data synchronization	Phenomenon	Possible cause	Solution
Waiting for schedule resources	Phenomenon 1: The sync task log shows the task is waiting for the gateway. Phenomenon 2: The instance properties page shows a long wait time for resources.	Offline tasks are dispatched by a scheduling resource group to an engine for execution. If the number of tasks running on the scheduling resource group reaches its upper limit, new tasks must wait for running tasks to complete and release resources.	On the Operation Center page, you can view which tasks are occupying resources while the current task is waiting. Note If you use a shared resource group for scheduling, migrate the task to an exclusive or Serverless resource group for execution.
Waiting for execution resources	The sync task log shows `wait`.	The remaining resources in the Data Integration resource group are insufficient to run the current task. For example, a resource group supports a maximum of eight concurrent threads. Three tasks are configured, each requiring three concurrent threads. If two tasks run at the same time, they use six threads. The resource group has only two threads left. The third task, which requires three threads, must wait because of insufficient resources. The log for this task shows `wait`.	Check if other tasks are running and using many resources in the resource group. You can use the following solutions to resolve this issue: Note On the Operation Center page, you can view the resource usage and the information about the tasks that are using resources while the current task is waiting. The maximum number of concurrent threads that a resource group can run varies based on its specifications. For more information, see Performance metrics and billing standards. Check if the tasks that occupy resources are stuck or have slowed down significantly. If so, resolve these issues first or stop some of the tasks. If the tasks are not stuck, wait for them to complete and release the resources. Then, start the current task. You can also find the list of tasks that are using the resources and their owners. Coordinate with them to reduce the concurrency of their tasks. You can also reduce the concurrency of the current sync task and then resubmit and publish it. You can also scale out the resource group for execution. For more information, see Scale-out and scale-in operations.
Sync task runs too slowly	The sync task log shows run, but the speed is 0. The task is running. If this state persists, click Detail log to view the execution details.If the Detail log shows a large value for the WaitReaderTime parameter, it indicates that the task is waiting a long time for the source to return data.	The source shard key is not configured properly. The SQL statements generated based on the shard key to read data from the source database execute slowly. The SQL statements used to read data from the source take a long time to execute (for example, the `where` or `querySql` parameters in some plugins). Scenario example: A data synchronization task slows down because of a full table scan. This happens when the `WHERE` clause does not have an index. The database load is high at the time of synchronization. Network issues exist, such as bandwidth (throughput) and network speed. Note The data synchronization speed cannot be guaranteed over the Internet.	To resolve slow statement execution: When you configure pre- or post-SQL statements: Ensure that an index is added to the fields used for data filtering. This prevents the sync task from performing a full table scan. Avoid or reduce complex processing, such as using functions. If necessary, perform these operations in the database before synchronization. Check if the source data table contains too much data. If so, split the data into multiple tasks. Query the logs to find the blocked SQL statements and consult a database administrator for a solution. Check the database load at the time of synchronization.
	The sync task log shows run, but the speed is 0. The task is running. If this state persists, click Detail log to view the execution details.If the Detail log shows a large value for the WaitWriterTime parameter, it indicates that the task is taking a long time to write data to the destination.	The pre- or post-SQL statements configured in the writer plugin execute slowly (for example, the SQL statements configured in the `preSql` or `postSql` parameters in some plugins). The database load is high at the time of synchronization. Network issues exist, such as bandwidth (throughput) and network speed. Note The data synchronization speed cannot be guaranteed over the Internet.
	The log shows run and a non-zero speed, but the synchronization process is slow.	The shard key for a relational database task is not configured properly. This causes the concurrency setting to be ineffective, and the task runs with a single thread. The concurrency is set too low. A large amount of dirty data is generated during synchronization, which affects the speed. Database performance issues exist. Note A database with better performance can handle higher concurrency. This lets you configure a higher concurrency for the data synchronization job. Network issues exist, such as bandwidth (throughput) and network speed. Note The data synchronization speed cannot be guaranteed over the Internet.	Configure the shard key properly. For more information about configuring a task shard key, see Configure a task in the codeless UI. Within the maximum concurrency supported by the resource group, plan the concurrency for each task and increase the concurrency for the current task as needed. In the codeless UI, configure the concurrency to specify the degree of parallelism for the task. The following code shows how to configure the concurrency in the code editor. Note The maximum number of concurrent threads that a resource group can run varies based on its specifications. For more information, see Performance metrics and billing standards. Handle dirty data. For more information about dirty data, see Data Integration. When you set the concurrency for distributed tasks, the number of machines in the resource group cannot exceed the maximum concurrency of a single machine in that group. When you synchronize data across clouds or regions, establish a network connection and use an internal network for synchronization. For more information about network connectivity solutions, see Network connectivity solutions. Check the database load.

Limit the synchronization speed

By default, Data Integration sync tasks are not throttled. A task runs at the highest possible speed within the configured concurrency limit. However, a high speed may overstress the database and affect other production services. Data Integration provides a throttling option that you can configure as needed. After you enable throttling, we recommend that you set the maximum speed to no more than 30 MB/s. The following code shows how to configure throttling in the code editor to set a bandwidth limit of 1 MB/s.

"setting": {
      "speed": {
         "throttle": true // Enables throttling.
        "mbps": 1,　// The specific rate.
      }
    }

The throttle parameter can be set to true or false:
- When throttle is set to true, throttling is enabled. You must set a specific data value for mbps. If you do not set mbps, the program encounters an error or the rate is abnormal.
- When throttle is set to false, throttling is disabled, and the mbps configuration is ignored.
The traffic measure is a Data Integration metric and does not represent the actual network interface card (NIC) traffic. Typically, the NIC traffic is one to two times the channel traffic. The actual traffic overhead depends on the data serialization of the data storage system.
A single semi-structured file does not have a shard key. For multiple files, you can set a job speed limit. However, the effective speed limit is also related to the number of files.
For example, for n files, the effective speed limit is n MB/s:
- If you set the speed limit to n+1 MB/s, the data is synchronized at n MB/s.
- If you set the speed limit to n-1 MB/s, the data is synchronized at n-1 MB/s.
For a relational database, you must configure a shard key for the speed limit to be effective across multiple threads. Relational databases typically support only numeric shard keys. However, Oracle databases support both numeric and string shard keys.

FAQ

FAQ for offline synchronization.
The `BatchSize` or `maxfilesize` parameter controls the number of records in a batch submission. A suitable value can reduce network interactions between Data Integration and the database and increase throughput. However, if this value is too large, an out-of-memory (OOM) error may occur in the synchronization process. If this error occurs, see FAQ for offline synchronization.

Appendix: Check the actual parallelism

On the log details page of a data sync task, find a log entry in the format JobContainer - Job set Channel-Number to 2 channels.. The value of channels is the actual degree of parallelism for the task. 查看实际并发

Appendix: Relationship between parallelism and resource usage

In an exclusive resource group, resource usage is determined by the relationship between concurrency and CPU, and between concurrency and memory:

Relationship between concurrency and CPU
In an exclusive resource group, the ratio of vCPUs to concurrency is 1:2. For example, an ECS machine with 4 vCPUs and 8 GiB of memory provides a concurrency quota of 8 for its exclusive resource group. It can run a maximum of eight offline sync tasks with a concurrency of 1, or four offline sync tasks with a concurrency of 2.
If the concurrency required by a new task submitted to an exclusive resource group is greater than the remaining concurrency quota of the group, the new task must wait. It runs after the running tasks in the group are complete and the remaining concurrency quota is sufficient for the new task.
Note
If the concurrency set for a new task exceeds the maximum concurrency quota of the exclusive resource group, the task will be permanently stuck in the waiting state. For example, this occurs if you submit a task with a concurrency of 10 to an exclusive resource group on an ECS machine with 4 vCPUs and 8 GiB of memory. Because the resource group allocates resources based on the submission order, subsequent tasks will also be blocked.
Relationship between concurrency and memory
In an exclusive resource group, the memory occupied by a single task is calculated as Min{768 + (Concurrency - 1) × 256, 8029} MB. However, you can override this calculation in the task settings. In the code editor, set the JSON path $.setting.jvmOption.
Ensure that the total memory used by all running tasks is at least 1 GB less than the total memory of all machines in the exclusive resource group. This allows the tasks to run smoothly. If this condition is not met, the Linux OOM Killer mechanism may forcibly stop the tasks.
Note
If you do not use the code editor to increase the task's memory, you only need to consider the concurrency quota limit of the exclusive resource group when you submit tasks.

Appendix: Synchronization speed

Read and write speeds vary greatly among different data sources. The following sections describe the average single-thread synchronization speed for typical data sources in an exclusive resource group:

Average single-thread speed for different Writer plugins

Writer	Average single-thread speed (KB/s)
AnalyticDB for PostgreSQL	147.8
AnalyticDB for MySQL	181.3
ClickHouse	5259.3
DataHub	45.8
DRDS	93.1
Elasticsearch	74.0
FTP	565.6
GDB	17.1
HBase	2395.0
hbase20xsql	37.8
HDFS	1301.3
Hive	1960.4
HybridDB for MySQL	323.0
HybridDB for PostgreSQL	116.0
Kafka	0.9
LogHub	788.5
MongoDB	51.6
MySQL	54.9
ODPS	660.6
Oracle	66.7
OSS	3718.4
OTS	138.5
PolarDB	45.6
PostgreSQL	168.4
Redis	7846.7
SQLServer	8.3
Stream	116.1
TSDB	2.3
Vertica	272.0

Average single-thread speed for different Reader plugins

Reader	Average single-thread speed (KB/s)
AnalyticDB for PostgreSQL	220.3
AnalyticDB for MySQL	248.6
DRDS	146.4
Elasticsearch	215.8
FTP	279.4
HBase	1605.6
hbase20xsql	465.3
HDFS	2202.9
Hologres	741.0
HybridDB for MySQL	111.3
HybridDB for PostgreSQL	496.9
Kafka	3117.2
LogHub	1014.1
MongoDB	361.3
MySQL	459.5
ODPS	207.2
Oracle	133.5
OSS	665.3
OTS	229.3
OTSStream	661.7
PolarDB	238.2
PostgreSQL	165.6
RDBMS	845.6
SQLServer	143.7
Stream	85.0
Vertica	454.3