All Products
Search
Document Center

DataWorks:RestAPI (HTTP) data source

Last Updated:Oct 28, 2025

You can create a RestAPI data source to write JSON data via a RESTful API to another data source, such as MaxCompute, using a data synchronization task. A RestAPI data source can also be used as a destination to receive data from other data sources. This topic describes the data synchronization capabilities of the RestAPI data source in DataWorks.

Limits

Supported field types

Important

When data is synchronized to a destination, only a single-layer table schema is supported. Nested field structures are not supported. For example, if an API returns the structure `{data: {user: { id: 1, name:'lily'}, value: 123}}`, the fields must be processed as parallel fields such as `user_id`, `user_name`, and `value` at the destination.

Type classification

Data Integration column configuration type

Integer

LONG, INT

String

STRING

Floating-point

DOUBLE, FLOAT

Boolean

BOOLEAN

Date and time

DATE

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configuration guide for a single-table offline sync task

Examples

FAQ

  • Can I only specify the number of pages for data requests?

    • Answer: Yes, you can.

  • Is automatic paging supported? For example, can paging stop when no more data is returned for the request parameters?

    • Answer: No, it is not. Otherwise, the data cannot be split.

  • If I must specify the number of pages, but the specified number is greater than the actual number of pages, what happens when the subsequent pages are empty?

    • Answer: If a subsequent page is empty, it is treated as an empty result from a SQL query. The system proceeds to the next query.

  • Is only single-layer JSON parsing supported?

    • Answer: Yes, it is. Deep parsing is not performed.

  • How do I configure a non-array type for a RestAPI in DataWorks Data Integration?

    • Answer: In the reader section, within the parameter section, set the dataPath parameter to the path of the non-array data. For example, dataPath:"data.list". This allows the plugin to locate the data field that you want to read. Then, set the dataMode parameter to multiData. This setting instructs DataWorks to process the data as multiple separate records, even if the data is not in an array format in the source data.

      Note

      Note that in multiData mode, the column configuration is not applicable. You must specify the data path to read directly in the dataPath parameter.

      The following code shows a configuration example for a non-array type for a RestAPI in DataWorks Data Integration:

      reader: {
        name: "restapi",
        parameter: {
          dataPath: "data.list",
          dataMode: "multiData",
          // Other parameters
        }
      }

Appendix: Script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

  • The following code provides a script example:

    {
        "type":"job",
        "version":"2.0",
        "steps":[
            {
                "stepType":"restapi",
                "parameter":{
                    "url":"http://127.0.0.1:5000/get_array5",
                    "dataMode":"oneData",
                    "responseType":"json",
                    "column":[
                        {
                            "type":"long",
                            "name":"a.b"  // Find data from the a.b path.
                        },
                        {
                            "type":"string",  // Find data from the a.c path.
                            "name":"a.c"
                        }
                    ],
                    "dirtyData":"null",
                    "method":"get",
                    "socketTimeout":"60000",
                    "defaultHeader":{
                        "X-Custom-Header":"test header"
                    },
                    "customHeader":{
                        "X-Custom-Header2":"test header2"
                    },
                    "parameters":"abc=1&def=1"
                },
                "name":"restapireader",
                "category":"reader"
            },
            {
                "stepType":"stream",
                "parameter":{
    
                },
                "name":"Writer",
                "category":"writer"
            }
        ],
        "setting":{
            "errorLimit":{
                "record":""
            },
            "speed":{
                "throttle":true,  // If throttle is set to false, the mbps parameter does not take effect and the data rate is not limited. If throttle is set to true, the data rate is limited.
                "concurrent":1,  // The concurrency of the job. 
                "mbps":"12"// The maximum data rate. 1 mbps is equal to 1 MB/s.
            }
        },
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        }
    }
  • The following code shows the configuration in the code editor:

    After the Restapi plugin sends an HTTP or HTTPS request, it receives a response body in JSON format. The dataPath parameter specifies the JSONPath used to extract data from the body. The following two examples show how to configure the parameters:
    
    
    Example 1: The API returns the following body. The business data is in the DATA field, and the API returns multiple rows of data at a time. DATA is an array.
    {
        "HEADER": {
            "BUSID": "bid1",
            "RECID": "uuid",
            "SENDER": "dc",
            "RECEIVER": "pre",
            "DTSEND": "202201250000"
        },
        "DATA": [
            {
                "SERNR": "sernr1"
            },
            {
                "SERNR": "sernr2"
            }
        ]
    }
    
    If you want to extract multiple rows of data from DATA as multiple synchronization records, set column to "column": [ "SERNR" ], dataMode to "dataMode": "multiData", and dataPath to "dataPath": "DATA".
    
    
    Example 2: The API returns the following body. The business data is in the content.DATA field, and the API returns one row of data at a time. DATA is an object.
    {
        "HEADER": {
            "BUSID": "bid1",
            "RECID": "uuid",
            "SENDER": "dc",
            "RECEIVER": "pre",
            "DTSEND": "202201250000"
        },
        "content": {
            "DATA": {
                "SERNR": "sernr2"
            }
        }
    }
    
    If you want to extract one row of data from content.DATA as one synchronization record, set column to "column": [ "SERNR" ], dataMode to "dataMode": "oneData", and dataPath to "dataPath": "content.DATA".
                    

Reader script parameters

Note

The following parameters are used when you add a data source and configure a Data Integration node.

The plugin does not support scheduling parameters.

Parameter

Description

Required

Default value

url

The address of the RESTful API.

Yes

None

dataMode

The format of the JSON data returned for a RESTful request.

  • oneData: retrieves one piece of data from the returned JSON data.

  • multiData: retrieves a JSON array from the returned JSON data and passes multiple pieces of data to the writer.

Yes

None

responseType

The format of the returned data. Only JSON is supported.

Yes

JSON

column

The list of fields to read. The type parameter specifies the type of the source data, and the name parameter specifies the JSON path from which to retrieve the data for the current column. You can specify the column fields. Example:

"column":[{"type":"long","name":"a.b" // Find data from the a.b path.}, {"type":"string","name":"a.c" // Find data from the a.c path.}]

You must specify the type and name parameters for each column.

Yes

None

dataPath

The path used to query a single JSON object or a JSON array from the returned result.

No

None

method

The request method. Valid values: get and post.

Yes

None

socketTimeout

The socket timeout period for accessing data from the RESTful API. Unit: milliseconds.

No

60000

customHeader

The header information passed to the RESTful API.

No

None

parameters

The parameter information passed to the RESTful API.

  • For a GET request, enter parameters in the abc=1&def=1 format.

  • For a POST request, enter parameters in JSON format.

No

None

dirtyData

The method for handling a situation where data cannot be found in the specified JSON path of a column.

  • dirty: If a column cannot be found when a record is parsed, the record is marked as dirty data.

  • null: If a column cannot be found when a record is parsed, the value of the column is set to null.

Yes

dirty

requestTimes

The number of times to request data from the RESTful address.

  • single: sends only one request.

  • multiple: sends multiple requests.

Yes

single

requestParam

If you set requestTimes to multiple, you must specify the parameter for the loop, such as pageNumber. The plugin passes the pageNumber parameter to the RESTful API in a loop based on the specified startIndex, endIndex, and step parameters to send multiple requests.

No

None

startIndex

The start index for the loop request. The start index is included in the loop.

No

None

endIndex

The end index for the loop request. The end index is included in the loop.

No

None

step

The step size for the loop request.

No

None

authType

The authentication method. Valid values:

  • Basic Authentication: basic authentication

    If the data source supports username- and password-based authentication, you can select Basic Authentication and configure the username and password that can be used for authentication. During data integration, the username and password are transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Token Authentication: token-based authentication

    If the data source supports token-based authentication, you can select Token Authentication and configure a fixed token value that can be used for authentication. During data integration, the token is contained in the request header, such as {"Authorization":"Bearer TokenXXXXXX"}, and transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

    Note

    If you want to use a custom authentication method, you can select Token Authentication and configure a fixed token value in the Token field. The token value can be used for authentication after it is encrypted.

No

None

authUsername/authPassword

The username and password for Basic Auth.

No

None

authToken

The token for Token Auth.

No

None

accessKey/accessSecret

The account information for Aliyun API signature authentication.

No

None

Writer script demo

{
    "type":"job",
    "version":"2.0",
    "steps":[
        {
            "stepType":"stream",
            "parameter":{

            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"restapi",
            "parameter":{
                "url":"http://127.0.0.1:5000/writer1",
                "dataMode":"oneData",
                "responseType":"json",
                "column":[
                    {
                        "type":"long", // Place column data in the a.b path.
                        "name":"a.b"
                    },
                    {
                        "type":"string", // Place column data in the a.c path.
                        "name":"a.c"
                    }
                ],
                "method":"post",
                "defaultHeader":{
                    "X-Custom-Header":"test header"
                },
                "customHeader":{
                    "X-Custom-Header2":"test header2"
                },
                "parameters":"abc=1&def=1",
                "batchSize":256
            },
            "name":"restapiwriter",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0" // The number of error records.
        },
        "speed":{
            "throttle":true,// If throttle is set to false, the mbps parameter does not take effect and the data rate is not limited. If throttle is set to true, the data rate is limited.
            "concurrent":1, // The concurrency of the job.
            "mbps":"12"// The maximum data rate. 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

url

The address of the RESTful API.

Yes

None

dataMode

The format of the JSON data passed in a RESTful request.

  • oneData: passes one record per request. The number of requests is the same as the number of records.

  • multiData: passes a batch of records per request. The number of requests is determined by the number of tasks split on the reader side.

Yes

None

column

The list of field paths corresponding to the generated JSON data. The type parameter specifies the type of the source data, and the name parameter specifies the JSON path where the data for the current column is placed. You can specify the column fields. Example:

"column":[{"type":"long","name":"a.b" // Place column data in the a.b path.}, {"type":"string","name":"a.c" // Place column data in the a.c path.}]

Note

You must specify the type and name parameters for each column.

Yes

None

dataPath

The path of the JSON object where the data result is placed.

No

None

method

The request method. Valid values: post and put.

Yes

None

customHeader

The header information passed to the RESTful API.

No

None

authType

The authentication method.

  • Basic Authentication: basic authentication

    If the data source supports username- and password-based authentication, you can select Basic Authentication and configure the username and password that can be used for authentication. During data integration, the username and password are transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

  • Token Authentication: token-based authentication

    If the data source supports token-based authentication, you can select Token Authentication and configure a fixed token value that can be used for authentication. During data integration, the token is contained in the request header, such as {"Authorization":"Bearer TokenXXXXXX"}, and transferred to the RESTful API URL for authentication. The data source is connected only after the authentication is successful.

    Note

    If you want to use a custom authentication method, you can select Token Authentication and configure a fixed token value in the Token field. The token value can be used for authentication after it is encrypted.

No

None

authUsername/authPassword

The username and password for Basic Auth.

No

None

authToken

The token for Token Auth.

No

None

accessKey/accessSecret

The account information for Aliyun API signature authentication.

No

None

batchSize

The maximum number of records in a single request when dataMode is set to multiData.

Yes

512