All Products
Search
Document Center

DataWorks:SQL Server data source

Last Updated:Oct 28, 2025

The SQL Server data source provides a bidirectional channel to read data from and write data to SQL Server. This topic describes the data synchronization capabilities that DataWorks provides for SQL Server.

Supported versions

SQL Server Reader uses the com.microsoft.sqlserver sqljdbc4 4.0 driver. For more information about the driver capabilities, see the official documentation. The following table lists the SQL Server versions that the driver supports.

Version

Supported (Yes/No)

SQL Server 2016

Yes

SQL Server 2014

Yes

SQL Server 2012

Yes

PDW 2008R2 AU34

Yes

SQL Server 2008 R2

Yes

SQL Server 2008

Yes

SQL Server 2019

No

SQL Server 2018

No

Azure SQL Managed Instance

No

Azure Synapse Analytics

No

Azure SQL Database

Yes

Limits

Offline synchronization supports reading data from views.

Supported field types

For a complete list of SQL Server field types, see the SQL Server Help documentation. The following table lists the support for common field types in SQL Server 2016 as an example.

SQL Server 2016 field type

SQL Server Reader

SQL Server Writer

bigint

Support

Supported

bit

Supported

Supported

decimal

Supported

Support

int

Supported

Supported

money

Supported

Supported

numeric

Supported

Supported

smallint

Supported

Supported

smallmoney

Supported

Supported

tinyint

Supported

Supported

float

Supported

Supported

real

Supported

Supported

date

Supported

Supported

datetime2

Supported

Supported

datetime

Supported

Support

datetimeoffset

Not supported

Not supported

smalldatetime

Supported

Supported

time

Supported

Supported

char

Supported

Supported

text

Supported

Supported

varchar

Supported

Supported

nchar

Supported

Supported

ntext

Supported

Supported

nvarchar

Supported

Supported

binary

Supported

Support

image

Supported

Supported

varbinary

Support

Supported

cursor

Not supported

Not supported

hierarchyid

Not supported

Not supported

sql_variant

Supported

Supported

Spatial Geometry Types

Not supported

Not supported

table

Not supported

Not supported

rowversion

Not supported

Not supported

uniqueidentifier

Supported

Supported

xml

Supported

Supported

Spatial Geography Types

Not supported

Not supported

The following table lists the mappings between SQL Server data types and the data types that SQL Server Reader and SQL Server Writer support.

Category

SQL Server data type

Integer

BIGINT, INT, SMALLINT, and TINYINT

Floating-point

FLOAT, DECIMAL, REAL, and NUMERIC

String

CHAR, NCHAR, NTEXT, NVARCHAR, TEXT, VARCHAR, NVARCHAR (MAX), and VARCHAR (MAX)

Date and time

DATE, DATETIME, and TIME

Boolean

BIT

Binary

BINARY, VARBINARY, VARBINARY (MAX), and TIMESTAMP

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task for a single table

Configure a batch read synchronization task for an entire database

For more information about this procedure, see Configure a real-time synchronization task for an entire database.

FAQ

  • Data restoration for active/standby synchronization

    You can use an active/standby disaster recovery architecture for SQL Server where the secondary database continuously restores data from the primary database using binary logs. Data synchronization between the primary and secondary databases has a specific latency. In some cases, such as when network latency occurs, the data restored to the secondary database may differ significantly from the data in the primary database. The data synchronized from the secondary database is not a complete and real-time snapshot.

  • Consistency constraints

    SQL Server is a relational database management system (RDBMS) and provides interfaces for strongly consistent data queries. For example, during a synchronization task, SQL Server Reader does not retrieve any data updates that are written during the task because of the database's snapshot feature.

    The preceding description applies to the data synchronization consistency of SQL Server Reader in a single-threaded model. SQL Server Reader can use concurrent data retrieval based on your configuration. Therefore, strong data consistency cannot be guaranteed.

    When SQL Server Reader shards data based on the splitPk parameter, it starts multiple concurrent tasks to complete the data synchronization. These concurrent tasks do not belong to the same read transaction and have time intervals between them. Therefore, the synchronized data is not a complete and consistent data snapshot.

    Consistent snapshots for multi-threaded tasks are not technically feasible. You can address this issue only from an engineering perspective. Engineering methods involve trade-offs. The following solutions are provided. You can choose one based on your requirements.

    • Use single-threaded synchronization. This means you do not shard the data. The disadvantage is that the speed is relatively slow, but consistency can be guaranteed.

    • Stop other data writers to ensure that the current data is static. For example, you can lock tables or stop synchronization to the secondary database. The disadvantage is that this may affect online services.

  • Database encoding

    SQL Server Reader uses Java Database Connectivity (JDBC) to retrieve data. JDBC is compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for SQL Server Reader. It automatically retrieves and converts the encoding.

  • Methods for incremental data synchronization

    SQL Server Reader uses JDBC SELECT statements to retrieve data. You can use SELECT…WHERE… to retrieve incremental data in the following ways:

    • When an online application writes data to the database, you can populate a modify field with a timestamp for changes, including new additions, updates, and logical deletions. For this type of application, SQL Server Reader needs to add only a where clause with the timestamp of the previous synchronization phase.

    • For new streaming data, SQL Server Reader can add a WHERE clause with the maximum auto-increment ID from the previous phase.

    If no business fields are available to distinguish new and modified data, SQL Server Reader cannot perform incremental data synchronization and can synchronize only full data.

  • SQL security

    SQL Server Reader provides the querySql parameter for you to implement custom SELECT statements. SQL Server Reader does not perform any security checks on the querySql statement.

Appendix: Script examples and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"sqlserver",// The plug-in name.
            "parameter":{
                "datasource":"",// The data source.
                "column":[// The fields.
                    "id",
                    "name"
                ],
                "where":"",// The filter condition.
                "splitPk":"",// If you specify splitPk, the data is sharded based on the specified field.
                "table":""// The data table.
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. If you set throttle to false, the mbps parameter does not take effect, and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
            "concurrent":1, // The number of concurrent jobs.
            "mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

If you want to use a querySql query, the following code provides an example of the Reader part of the script. In this example, the SQL Server data source is sql_server_source, the table to query is dbo.test_table, and the column to query is name.

{
    "stepType": "sqlserver",
    "parameter": {
        "connection": [
            {
                "querySql": ["select name from dbo.test_table"],
                "datasource": "sql_server_source"
            }
        ],
        "datasource": "sql_server_source",
        "column": ["name"],
        "where": "",
        "splitPk": "id"
    },
    "name": "Reader",
    "category": "reader"
}

Reader script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The value must be the same as the name of the data source that you add in the code editor.

Yes

None

table

The name of the source table. A task can synchronize data from only one table.

Yes

None

column

The columns from which you want to synchronize data. Specify the columns in a JSON array. To synchronize all columns, set this parameter to `["*"]`.

  • You can select a subset of columns to export.

  • You can export columns in an order different from the schema of the source table.

  • You can configure constants. You must follow the SQL syntax. Example: ["id", "table","1", "'mingya.wmy'", "'null'", "to_char(a+1)", "2.3" , "true"] .

    • id is a regular column name.

    • table is a column name that contains a reserved word.

    • 1 is an integer constant.

    • 'mingya.wmy' is a string constant. Note that it must be enclosed in single quotation marks.

    • 'null' is a string.

    • to_char(a + 1) is a function expression.

    • 2.3 is a floating-point number.

    • true is a Boolean value.

  • You must explicitly specify the columns to synchronize for the column parameter. This parameter cannot be empty.

Yes

None

splitFactor

The split factor, which specifies the number of shards for data synchronization. If multiple concurrent threads are configured, data is sharded into a number of shards equal to Concurrency × splitFactor. For example, if Concurrency is 5 and splitFactor is 5, data is sharded into 25 shards (5 × 5) and executed on 5 concurrent threads.

Note

The recommended value is an integer from 1 to 100. An excessively large value may cause an out-of-memory (OOM) error.

No

5

splitPk

When SQL Server Reader extracts data, you can specify splitPk to shard data based on the specified field. The data synchronization system then starts concurrent tasks to synchronize the data, which can improve data synchronization efficiency.

  • We recommend that you use the primary key of the table as the value of splitPk. A primary key is usually evenly distributed. This prevents data hotspots in the shards.

  • Currently, splitPk supports sharding only for integer data. If you specify a field of another data type, such as string, floating-point number, or date, SQL Server Reader reports an error.

No

None

where

The filter condition. SQL Server Reader combines the specified column, table, and where parameters into an SQL statement to retrieve data. For example, during testing, you can set the where condition to `limit 10`. In a typical business scenario, you might synchronize data for the current day by setting the where condition to gmt_create > $bizdate.

  • The where condition allows for efficient incremental synchronization.

  • If the where condition is empty, all data in the table is synchronized.

No

None

querySql

Format: "querysql" : "query statement",. In some business scenarios, the where parameter is not sufficient to describe the filter conditions. You can use this parameter to customize the filter SQL statement. If you configure this parameter, Data Integration ignores the tables and columns parameters and uses this SQL statement to filter data. For example, to join multiple tables for data synchronization, use select a,b from table_a join table_b on table_a.id = table_b.id. When you configure querySql, SQL Server Reader ignores the column, table, and where parameters.

No

None

fetchSize

The number of records to retrieve in each batch. This parameter determines the number of network interactions between Data Integration and the database server and can improve data retrieval performance.

Note

An excessively large fetchSize value (> 2048) may cause an OOM error in the data synchronization process.

No

1024

  • SQL Server Reader combines the information that you configure for table, column, and where into an SQL statement and sends the statement to the SQL Server database.

  • SQL Server directly sends the information that you configure for querySql to the SQL Server database.

Writer script example

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"sqlserver",// The plug-in name.
            "parameter":{
                "postSql":[],// The SQL statement to execute after the data synchronization task.
                "datasource":"",// The data source.
                "column":[// The fields.
                    "id",
                    "name"
                ],
                "table":"",// The table name.
                "preSql":[]// The SQL statement to execute before the data synchronization task.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records.
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. If you set throttle to false, the mbps parameter does not take effect, and no rate limit is imposed. If you set throttle to true, a rate limit is imposed.
            "concurrent":1, // The number of concurrent jobs.
            "mbps":"12"// The rate limit. 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The value must be the same as the name of the data source that you add in the code editor.

Yes

None

table

The names of the tables to be synchronized.

Yes

None

column

The destination columns to which you want to write data. Separate the column names with commas. Example: "column":["id","name","age"]. To write data to all columns in order, use an asterisk (*). Example: "column":["*"].

Yes

None

preSql

An SQL statement that is executed before the data synchronization task starts. In the codeless UI, you can execute only one SQL statement. In the code editor, you can execute multiple SQL statements. For example, you can use this parameter to clear previous data.

No

None

postSql

An SQL statement that is executed after the data synchronization task is complete. In the codeless UI, you can execute only one SQL statement. In the code editor, you can execute multiple SQL statements. For example, you can use this parameter to add a timestamp.

No

None

writeMode

The write mode. The value insert is supported. If a primary key or unique index conflict occurs, Data Integration treats the data as dirty data but retains the original data.

No

insert

batchSize

The number of records to submit in each batch. This value can greatly reduce the number of network interactions between Data Integration and SQL Server and improve the overall throughput. If you set this value too high, an OOM error may occur in the data synchronization process.

No

1,024