This topic describes the factors that affect data synchronization speed, how to adjust concurrency to maximize synchronization speed, job throttling options, and solutions for slow data synchronization scenarios.
Overview
Data synchronization speed is affected by many factors, such as task configurations, database performance, and network conditions. For more information, see Factors that affect data synchronization speed.
Slow data synchronization can occur at different stages of the process. This topic describes solutions for slow performance at each stage. For more information, see Scenarios and solutions for slow data synchronization.
If the database performance is limited, a faster synchronization speed is not always better. A high speed may overstress the database and affect other production services. Data Integration provides throttling options that you can configure as needed. For more information, see Limit the synchronization speed.
Factors that affect data synchronization speed
Data synchronization speed is affected by factors such as the source and destination database environments and task configurations. You are primarily responsible for monitoring and tuning the performance, load, and network conditions of the source and destination databases.
The following factors affect data synchronization speed:
Factor | Description |
Source data source |
|
Resource group for scheduling used by the offline sync task | Offline sync tasks are dispatched by schedule resources to run on Data Integration execution resources. The usage of schedule resources also affects the overall synchronization efficiency. |
Offline sync task configuration |
|
Destination data source |
|
Scenarios and solutions for slow data synchronization
For more information about offline sync task logs, see Analyze offline sync logs.
Scenario of slow data synchronization | Phenomenon | Possible cause | Solution |
Waiting for schedule resources |
| Offline tasks are dispatched by a scheduling resource group to an engine for execution. If the number of tasks running on the scheduling resource group reaches its upper limit, new tasks must wait for running tasks to complete and release resources. | On the Operation Center page, you can view which tasks are occupying resources while the current task is waiting. Note If you use a shared resource group for scheduling, migrate the task to an exclusive or Serverless resource group for execution. |
Waiting for execution resources | The sync task log shows `wait`. | The remaining resources in the Data Integration resource group are insufficient to run the current task. For example, a resource group supports a maximum of eight concurrent threads. Three tasks are configured, each requiring three concurrent threads. If two tasks run at the same time, they use six threads. The resource group has only two threads left. The third task, which requires three threads, must wait because of insufficient resources. The log for this task shows `wait`. | Check if other tasks are running and using many resources in the resource group. You can use the following solutions to resolve this issue: Note
|
Sync task runs too slowly | The sync task log shows run, but the speed is 0. The task is running. If this state persists, click Detail log to view the execution details. |
Note The data synchronization speed cannot be guaranteed over the Internet. |
|
The sync task log shows run, but the speed is 0. The task is running. If this state persists, click Detail log to view the execution details. |
Note The data synchronization speed cannot be guaranteed over the Internet. | ||
The log shows run and a non-zero speed, but the synchronization process is slow. |
Note The data synchronization speed cannot be guaranteed over the Internet. |
|
Limit the synchronization speed
By default, Data Integration sync tasks are not throttled. A task runs at the highest possible speed within the configured concurrency limit. However, a high speed may overstress the database and affect other production services. Data Integration provides a throttling option that you can configure as needed. After you enable throttling, we recommend that you set the maximum speed to no more than 30 MB/s. The following code shows how to configure throttling in the code editor to set a bandwidth limit of 1 MB/s.
"setting": {
"speed": {
"throttle": true // Enables throttling.
"mbps": 1, // The specific rate.
}
}The throttle parameter can be set to true or false:
When throttle is set to true, throttling is enabled. You must set a specific data value for mbps. If you do not set mbps, the program encounters an error or the rate is abnormal.
When throttle is set to false, throttling is disabled, and the mbps configuration is ignored.
The traffic measure is a Data Integration metric and does not represent the actual network interface card (NIC) traffic. Typically, the NIC traffic is one to two times the channel traffic. The actual traffic overhead depends on the data serialization of the data storage system.
A single semi-structured file does not have a shard key. For multiple files, you can set a job speed limit. However, the effective speed limit is also related to the number of files.
For example, for n files, the effective speed limit is n MB/s:
If you set the speed limit to n+1 MB/s, the data is synchronized at n MB/s.
If you set the speed limit to n-1 MB/s, the data is synchronized at n-1 MB/s.
For a relational database, you must configure a shard key for the speed limit to be effective across multiple threads. Relational databases typically support only numeric shard keys. However, Oracle databases support both numeric and string shard keys.
FAQ
The `BatchSize` or `maxfilesize` parameter controls the number of records in a batch submission. A suitable value can reduce network interactions between Data Integration and the database and increase throughput. However, if this value is too large, an out-of-memory (OOM) error may occur in the synchronization process. If this error occurs, see FAQ for offline synchronization.
Appendix: Check the actual parallelism
On the log details page of a data sync task, find a log entry in the format JobContainer - Job set Channel-Number to 2 channels.. The value of channels is the actual degree of parallelism for the task.
Appendix: Relationship between parallelism and resource usage
In an exclusive resource group, resource usage is determined by the relationship between concurrency and CPU, and between concurrency and memory:
Relationship between concurrency and CPU
In an exclusive resource group, the ratio of vCPUs to concurrency is 1:2. For example, an ECS machine with 4 vCPUs and 8 GiB of memory provides a concurrency quota of 8 for its exclusive resource group. It can run a maximum of eight offline sync tasks with a concurrency of 1, or four offline sync tasks with a concurrency of 2.
If the concurrency required by a new task submitted to an exclusive resource group is greater than the remaining concurrency quota of the group, the new task must wait. It runs after the running tasks in the group are complete and the remaining concurrency quota is sufficient for the new task.
NoteIf the concurrency set for a new task exceeds the maximum concurrency quota of the exclusive resource group, the task will be permanently stuck in the waiting state. For example, this occurs if you submit a task with a concurrency of 10 to an exclusive resource group on an ECS machine with 4 vCPUs and 8 GiB of memory. Because the resource group allocates resources based on the submission order, subsequent tasks will also be blocked.
Relationship between concurrency and memory
In an exclusive resource group, the memory occupied by a single task is calculated as Min{768 + (Concurrency - 1) × 256, 8029} MB. However, you can override this calculation in the task settings. In the code editor, set the JSON path $.setting.jvmOption.

Ensure that the total memory used by all running tasks is at least 1 GB less than the total memory of all machines in the exclusive resource group. This allows the tasks to run smoothly. If this condition is not met, the Linux OOM Killer mechanism may forcibly stop the tasks.
NoteIf you do not use the code editor to increase the task's memory, you only need to consider the concurrency quota limit of the exclusive resource group when you submit tasks.
Appendix: Synchronization speed
Read and write speeds vary greatly among different data sources. The following sections describe the average single-thread synchronization speed for typical data sources in an exclusive resource group:
Average single-thread speed for different Writer plugins
Writer
Average single-thread speed (KB/s)
AnalyticDB for PostgreSQL
147.8
AnalyticDB for MySQL
181.3
ClickHouse
5259.3
DataHub
45.8
DRDS
93.1
Elasticsearch
74.0
FTP
565.6
GDB
17.1
HBase
2395.0
hbase20xsql
37.8
HDFS
1301.3
Hive
1960.4
HybridDB for MySQL
323.0
HybridDB for PostgreSQL
116.0
Kafka
0.9
LogHub
788.5
MongoDB
51.6
MySQL
54.9
ODPS
660.6
Oracle
66.7
OSS
3718.4
OTS
138.5
PolarDB
45.6
PostgreSQL
168.4
Redis
7846.7
SQLServer
8.3
Stream
116.1
TSDB
2.3
Vertica
272.0
Average single-thread speed for different Reader plugins
Reader
Average single-thread speed (KB/s)
AnalyticDB for PostgreSQL
220.3
AnalyticDB for MySQL
248.6
DRDS
146.4
Elasticsearch
215.8
FTP
279.4
HBase
1605.6
hbase20xsql
465.3
HDFS
2202.9
Hologres
741.0
HybridDB for MySQL
111.3
HybridDB for PostgreSQL
496.9
Kafka
3117.2
LogHub
1014.1
MongoDB
361.3
MySQL
459.5
ODPS
207.2
Oracle
133.5
OSS
665.3
OTS
229.3
OTSStream
661.7
PolarDB
238.2
PostgreSQL
165.6
RDBMS
845.6
SQLServer
143.7
Stream
85.0
Vertica
454.3


If the Detail log shows a large value for the 

