The DataWorks data upload feature lets you upload data from local files, DataAnalysis workbooks, Object Storage Service (OSS) files, and HTTP files to engines such as MaxCompute, EMR Hive, Hologres, and StarRocks for analysis and management. This feature provides a convenient data transmission service to help you quickly use data to drive your business. This topic describes how to use the data upload feature.
Precautions
If you perform cross-border data uploads, such as transferring data from mainland China to outside mainland China or between different countries or regions, read the related compliance statement in advance. Otherwise, the data upload may fail, and you will be held legally responsible.
Before you upload data, set the table headers to English. If the table headers are in Chinese, parsing may fail and cause an upload error.
Limits
Resource group limits: The data upload feature requires you to specify a schedule resource group and a Data Integration resource group.
Only Serverless resource groups (recommended), exclusive resource groups for scheduling, and exclusive resource groups for Data Integration are supported. You must configure a schedule resource group and a Data Integration resource group for the corresponding engine in .
The selected resource group must be attached to the DataWorks workspace where the destination table resides. Ensure that the data source used by the data upload task can connect to the selected resource group over the network.
NoteTo configure resource groups for an engine in DataAnalysis, see System administration.
To establish a network connection between a data source and a resource group, see Network connection solutions.
To attach an exclusive resource group to a workspace, see Use an exclusive resource group for scheduling and Use an exclusive resource group for Data Integration.
Table limits: You can upload data only to tables that you own. This applies in the following scenarios:
The table details page in Data Map shows that you are the Table Owner. For more information about how to view table details, see View table details.
The table is a new table that you created when you uploaded data using the data upload feature.
Billing
Data upload incurs the following fees:
Data transmission fees.
If you create a new table, computing and storage fees are charged.
The preceding fees are charged by the respective engines. For specific fees, see the billing documentation for the corresponding engine: MaxCompute billing, Hologres billing, E-MapReduce billing, and EMR Serverless StarRocks product billing.
Go to the data upload page
Go to the Upload and Download page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Upload and Download.
In the navigation pane on the left, click the
icon to go to the Data Upload page.Click Data Upload and follow the on-screen instructions to upload the data.
Select the file data to upload
You can upload data from local files, workbooks, OSS, and HTTP files. Select a data source as needed.
When you upload a file, specify whether to filter out dirty data as needed.
Yes: If dirty data is encountered, the platform automatically ignores it and continues to upload the data.
No: If dirty data is encountered, the platform does not ignore it, and the data upload is interrupted.
Local file
If the data that you want to upload is in a local file, select this method.
Set Data Source to Local File.
Specify Data To Upload: Drag your local files to the Select File area.
NoteThe supported file formats are
CSV,XLS,XLSX, andJSON. The maximum file size is5 GBforCSVfiles and100 MBfor other file formats.By default, the first sheet of a file is uploaded. To upload multiple sheets from a file, you must create a table for each sheet and make it the first sheet of the file.
Uploading files in
SQLformat is not supported.
Workbook
If the data that you want to upload is in a DataWorks DataAnalysis workbook, select this method.
Set Data Source to Workbook.
Specify Data To Upload:
From the drop-down list next to Select File, select the workbook file to upload.
If the workbook does not exist, click the New button next to it to create one. You can also go to the DataAnalysis module to create a workbook and import data.
Object Storage Service (OSS)
If the data that you want to upload is in Object Storage Service (OSS), select this method.
Prerequisites:
You have created an OSS bucket and stored the data to be uploaded in it. You can then upload the data from OSS to the corresponding data source.
To avoid permission issues, use Resource Access Management (RAM) to grant the Alibaba Cloud account that you use to upload data the permissions to access the destination bucket before you upload the data.
Steps:
Set Data Source to Object Storage OSS.
Specify The Data To Upload:
From the Select Bucket drop-down list, select the destination OSS bucket that stores the data to upload.
NoteYou can upload data only from a bucket that is in the same region as the current DataWorks workspace.
In the Select File area, select the file data that you want to upload.
NoteOnly files in
CSV,XLS,XLSX, andJSONformats are supported.
HTTP file
If the data that you want to upload is in an HTTP file, select this method.
Set Data Source to HTTP File.
Specify Data To Upload:
Parameter
Configuration description
File Address
The address where the file data is stored.
NoteFile addresses in HTTP and HTTPS formats are supported.
File Type
The file type is automatically detected based on the file you upload.
Files in
CSV,XLS, andXLSXformats are supported. The maximum size of aCSVfile is 5 GB. The maximum size of other files is 50 MB.Request Method
GET, POST, and PUT are supported. Using GET to obtain data is recommended. However, the specific method depends on your defined allowed request methods.
Advanced Parameters
You can also set the Request Header and Request Body in the Advanced Parameters section as needed.
Set the destination table
In the Set Destination Table section, select a Destination Engine for the data upload and configure the related parameters for the selected engine.
When you set the destination table, distinguish between the production (PROD) and development (DEV) environments when you select a data source. If you select the wrong environment, the data is uploaded to the other environment.
MaxCompute
To upload data to a MaxCompute table, configure the following parameters.
Parameter | Configuration description | |
MaxCompute project name | Select a MaxCompute data source that is attached to the current region. If the data source that you want to use is not found, you can attach a MaxCompute compute resource to the current workspace to generate a data source with the same name. | |
Destination table | Select Existing Table or New Table. | |
Select destination table | The table where the data is stored. You can search for the table by keyword. Note You can upload data only to tables that you own. For more information, see Limits. | |
Upload mode | Select a method to add the data to the destination table.
| |
Table name | Enter a custom name for the new table. Note When a new table is created for the MaxCompute engine, the MaxCompute account information configured for the DataWorks computing resources is used. The table is then created in the corresponding MaxCompute project. | |
Table type | Select Non-partitioned Table or Partitioned Table as needed. If you select Partitioned Table, specify the partition fields and their values. | |
Lifecycle | Specify the lifecycle of the table. After the table expires, it may become unavailable. For more information about table lifecycles, see Lifecycle and Lifecycle action. | |
EMR HIVE
To upload data to an EMR HIVE table, configure the following parameters.
Parameter | Configuration description |
Data source | Select an EMR Hive data source (Alibaba Cloud instance mode) that is attached to the workspace in the current region. |
Destination table | You can upload data only to an Existing Table. |
Select destination table | The table where the data is stored. You can search for the table by keyword. Note
|
Upload mode | Select a method to add the data to the destination table.
|
Hologres
To upload data to a Hologres table, configure the following parameters.
Parameter | Configuration description |
Data source | Select a Hologres data source that is attached to the workspace in the current region. If the data source that you want to use is not found, you can attach a Hologres compute resource to the current workspace to generate a data source with the same name. |
Destination table | You can upload data only to an Existing Table. |
Select destination table | The table where the data is stored. You can search for the table by keyword. Note
|
Upload mode | Select a method to add the data to the destination table.
|
Primary key conflict policy | If a data upload causes a primary key conflict in the destination table, you can adopt one of the following policies.
|
StarRocks
To upload data to a StarRocks table, configure the following parameters.
Parameter | Configuration description |
Data source | Select a StarRocks data source that is attached to the workspace in the current region. |
Destination table | You can upload data only to an Existing Table. |
Select destination table | The table where the data is stored. You can search for the table by keyword. Note
|
Upload mode | Select a method to add the data to the destination table.
|
Advanced parameters | You can configure Stream Load request parameters. |
Preview the data to upload
After you set the destination table, you can adjust the file encoding and data mapping based on the data preview.
You can preview only the first 20 rows of data.
File Encoding: If the data contains garbled text, you can switch the encoding format.
UTF-8,GB18030,Big5,UTF-16LE, andUTF-16BEare supported.Preview data and set destination table fields:
Upload data to an existing table: You must configure the mapping between the columns in the source file and the fields in the destination table. After the mapping is configured, the data can be uploaded. You can select Map By Column Name or Map By Position. After the mapping is complete, you can also customize the field names in the destination table.
NoteIf a column in the source data is not mapped to a field in the destination table, the data in that column is grayed out and is not uploaded.
A column in the source data cannot be mapped to multiple fields in the destination table.
The field name and field type cannot be empty. Otherwise, the data cannot be uploaded.
Upload data to a new table: You can use Smart Field Generation to automatically fill in field information, or you can manually modify the field information.
NoteThe field name and field type cannot be empty. Otherwise, the data cannot be uploaded.
The EMR Hive, Hologres, and StarRocks engines do not support creating a new table during data upload.
Ignore First Row: Specify whether to upload the first row of the file data, which is usually the column names, to the destination table.
Selected: If the first row of the file contains column names, the first row is not uploaded to the destination table.
Not selected: If the first row of the file contains data, the first row is uploaded to the destination table.
Upload the data
After you preview the data, click the Data Upload button in the lower-left corner to upload the data.
What to do next
After the data is uploaded, you can click the
icon in the navigation pane on the left to go to the Data Upload page. Find the data upload task that you created and perform the following operations as needed:
Continue upload: In the Actions column, click Continue Upload to upload the data again.
Query data: In the Actions column, click Query Data to query and analyze the data.
View upload data details: Click the destination Table Name to go to Data Map and view the detailed information of the destination table. For more information, see General data query and management.
Appendix: Compliance statement for cross-border data upload
If you perform cross-border data uploads, such as transferring data from mainland China to outside mainland China or between different countries or regions, read the related compliance statement in advance. Otherwise, the data upload may fail, and you will be held legally responsible.
Cross-border data operations will cause your business data in the cloud to be transferred to the region or product deployment area that you select. You must ensure that such operations comply with the following requirements:
You have the right to process the relevant business data in the cloud.
You have adopted sufficient data security protection technologies and policies.
The data transfer complies with the requirements of relevant laws and regulations. For example, the transferred data does not contain any content that is restricted or prohibited from being transferred or disclosed by applicable laws.
Alibaba Cloud reminds you that if your data upload operation may result in cross-border data transfer, you should consult with professional legal or compliance personnel before you perform the operation. Ensure that the cross-border data transfer complies with the requirements of applicable laws, regulations, and regulatory policies. For example, you must obtain valid authorization from personal information subjects, complete the signing and filing of relevant contract clauses, and complete relevant security assessments and other legal obligations.
If you perform cross-border data operations without complying with this statement, you will bear the corresponding legal consequences. You are also liable for any losses incurred by Alibaba Cloud and its affiliates.
References
DataStudio also supports uploading data from local CSV or text files to MaxCompute tables. For more information, see Upload data.
For more information about operations on MaxCompute tables, see Create and use a MaxCompute table.
For more information about operations on Hologres tables, see Create a Hologres table.
For more information about operations on EMR tables, see Create an EMR table.
FAQ
Resource group configuration issue.
Error message: The current file source or destination engine requires a resource group to be configured for data upload. Contact the workspace administrator to configure a resource group.
Solution: To configure resource groups for an engine in DataAnalysis, see System administration.
Resource group attachment issue.
Error message: The global data upload resource group configured for your current workspace is not attached to the workspace to which the upload table belongs. Contact the workspace administrator to attach it.
Solution: You can attach the resource group that you set in System Administration to the workspace.