DataWorks provides Tinder Object Storage (TOS) Reader for you to read data from files that are stored in TOS. You can use TOS Reader to access files stored in TOS, parse the data in the files, and then synchronize the data to a destination. This topic describes the capabilities of synchronizing data of TOS data sources.
Limits
The following table describes the data types that are supported in DataWorks by TOS data sources.
Data type | Description |
STRING | Text. |
LONG | Integer. |
BYTES | Byte array. The text that is read is converted into a byte array. The encoding format is |
BOOL | Boolean. |
DOUBLE | Floating point. |
DATE | Date and time. The following date and time formats are supported:
|
Add a TOS data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
TOS data sources can be used as only sources in batch synchronization tasks to synchronize data of a single table. This section describes the entry point for and the procedure of configuring a data synchronization task.
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code for TOS Reader
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "tos",
"parameter": {
"datasource": "",
"object": ["f/z/1.csv"],
"fileFormat": "csv",
"encoding": "utf8/gbk/...",
"fieldDelimiter": ",",
"useMultiCharDelimiter": true,
"skipHeader": true,
"compress": "zip/gzip",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "boolean"
},
{
"index": 2,
"type": "double"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "date"
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Parameters in code for TOS Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
fileFormat | The format of the source file. Valid values: | Yes | No default value |
object | The file path. This parameter is required only when the fileFormat parameter is set to Note This parameter supports asterisks ( For example, if you want to synchronize data from the | Required if fileFormat is set to | No default value |
column | The columns from which you want to read data. The type parameter specifies the source data type. The index parameter specifies the ID of the column in the source file. Column IDs start from 0. The value parameter specifies the column value of a constant column that is automatically generated, instead of being read from the source.
Note For the column parameter, you must configure the type parameter and either the index or value parameter. | Yes | "column": ["*"] |
fieldDelimiter | The column delimiter that is used in the file from which you want to read data. Note
| Yes |
|
lineDelimiter | The row delimiter that is used in the file from which you want to read data. Note This parameter takes effect only when the fileFormat parameter is set to text. | No | No default value |
compress | The format in which files are compressed. By default, this parameter is left empty, which indicates that files are not compressed. The following compression formats are supported: | No |
|
encoding | The encoding format of the file from which you want to read data. | No |
|
nullFormat | The string that represents a null pointer. No standard strings can represent a null pointer in TXT files. You can use this parameter to define a string that represents a null pointer.
| No | No default value |
skipHeader | Specifies whether to skip the headers in a CSV file. Valid values:
Note The skipHeader parameter is unavailable for compressed files. | No |
|
parquetSchema | The schema of Parquet files that you want to read. If you set the fileFormat parameter to parquet, you must configure the parquetSchema parameter. Make sure that the entire script complies with the JSON syntax.
| No | No default value |
csvReaderConfig | The configurations required to read CSV files. The parameter value must match the MAP type. You can use a CSV file reader to read data from CSV files. If you do not configure this parameter, the default value is used. | No | No default value |