DataWorks Notebooks support multiple cell types and provide an interactive, modular analysis environment to help you efficiently process and analyze data, create visualizations, and build models.
Function Introduction
In DataWorks, you can use Notebook nodes to build an interactive, modular, and reusable analysis environment.
Multi-engine development: DataWorks Notebooks include a SQL Cell feature that supports SQL development and analysis on multiple big data engines.
Interactive analysis:
Interactive SQL queries: You can write widgets in Python to visually select or set parameter values. You can then reference these parameters and their values in SQL to enable interactive queries between Python and SQL.
Write SQL query results to a DataFrame: You can store SQL query results directly in a Pandas DataFrame or MaxFrame DataFrame object and pass these results as variables to subsequent cells.
Generate visual charts: You can read the DataFrame variable in a Python cell to plot charts based on the data. This creates an efficient interaction between Python and SQL.
Integrated big data and AI development: In a DataWorks Notebook, you can use libraries such as Pandas for data cleaning and preparation to ensure that the data meets the input requirements of your algorithm models. You can then use the cleaned data to easily develop, train, and evaluate your models. This provides a seamless connection between big data and AI.
Intelligent code generation: DataWorks Notebooks have a built-in intelligent programming assistant that supports generating SQL and Python code with DataWorks Copilot to improve development efficiency.
Attach datasets: In DataWorks Notebooks, on the tab, you can add a dataset to a Notebook. This allows the node to read data from OSS or NAS, or write files to OSS or NAS during runtime.
Prerequisites
Create a workspace and use the new version of Data Studio. You can create a workspace for the new version of Data Studio.
A Serverless resource group is available. For more information, see Use a Serverless resource group.
A personal development environment instance is created. Running a Notebook in DataStudio requires a personal development environment instance. For more information, see Create a personal development environment instance.
Notes
When you run this task using a Serverless resource group, the maximum supported configuration for a single task is 64 CU. However, we recommend that you do not exceed 16 CU. This prevents resource shortages caused by excessive CUs, which can affect task startup.
Supported cell types
SQL cell:
Supported cell types:
MaxCompute SQL,Hologres SQL,EMR SPARK SQL,StarRocks SQL,Flink SQL Batch, andFlink SQL Streaming.Supported computing resources:
MaxCompute,Hologres,EMR Serverless Spark,EMR Serverless StarRocks, andFully Managed Flink.
Python cell.
Markdown cell.
Create a personal development environment instance
Notebooks run on personal development environment instances. Before you start, you must create and switch to a target instance. You can install dependencies for Notebook node development, such as third-party Python libraries, in a personal development environment instance.
Create a Notebook node
Go to the Data Studio (New Version) page.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Create a Notebook.
In DataWorks, you can create a Notebook in the Project Folder, My Folder, or under One-Time Tasks.
In the navigation pane on the left, click the
icon to go to the Data Development page. Create a Notebook in the Project Folder or My Folder.Create a Notebook in the Project Folder:
Click the
icon and select Notebook to create a new Notebook.If you have already created a working directory, hover over the directory name, right-click, and choose to create a new Notebook.
If you have already created a workflow, you can add a Notebook node when editing the workflow.
Create a Notebook in My Folder:
Click the
icon to create a new Notebook file.Click the
icon and add a file in .ipynbformat to create a new Notebook.If you have already created a folder, hover over the folder name, right-click, and choose New Notebook to create a new Notebook.
In the navigation pane on the left, click the
icon to go to the One-Time page. Under One-Time Tasks, click the
icon and choose to create a new Notebook.
Develop a Notebook node

1. Add a cell
In the Notebook node toolbar, you can click the SQL, Python, or Markdown button to quickly create the corresponding cell type. You can also quickly add a new cell above or below a specific cell in the code editor.
Add a cell above the current cell: Hover over the top edge of a cell to display the add button. Click the button to insert a new cell above the current one.
Add a cell below the current cell: Hover over the bottom edge of a cell to display the add button. Click the button to insert a new cell below the current one.
To reorder cells, hover over the blue line in front of a cell, and then drag it to a new position.
2. (Optional) Switch the cell type
In a cell, you can click the Cell Type button in the lower-right corner to switch between cell types. For more information about cell types, see Supported cell types.
You can change a SQL cell from a MaxCompute SQL cell to a Hologres SQL cell or another SQL cell type.
You can change a SQL cell to a Python or Markdown cell, or switch a Python or Markdown cell to a SQL cell.
When you switch the cell type, the content is retained. You must manually adjust the code in the cell to match the new type.
3. Develop cell code
You can edit SQL, Python, and Markdown code in the corresponding cells. When you develop code in a SQL cell, ensure that the SQL syntax matches the selected SQL cell type, which is the computing resource type. You can use DataWorks Copilot Ask for programming assistance. You can access the intelligent assistant in the following ways:
From the cell toolbar: Click the
icon in the upper-right corner of the cell to open the Copilot chat box in the editor for programming assistance.From the cell's context menu: Right-click the cell and choose for programming assistance.
Using a keyboard shortcut:
macOS: Press
Command+Ito open the intelligent assistant chat box.Windows: Press
Ctrl+Ito open the intelligent assistant chat box.
Run a Notebook
1. Select a personal development environment
When you run a Notebook directly in DataStudio, the Python cells in the Notebook run based on a personal development environment. Therefore, you must select a created personal development environment instance at the top of the page as the runtime environment for the Notebook.
2. Confirm or switch the Python kernel
Confirm or switch the Python kernel: Click the
icon in the upper-right corner of the Notebook node to confirm the Python kernel version for the current Python cell, or to switch to another Python kernel version.
3. (Optional) Select a computing resource
SQL cell: Click the
icon in the lower-right corner of the SQL cell. You must specify an attached computing resource. When you run the cell, the SQL statement is executed using the specified computing resource.Python cell: By default, a Python cell uses the kernel of the personal development environment instance to run the code. To access a specific computing resource service, you can also use a built-in Magic Command to connect to a MaxCompute computing resource.
4. Run Notebook cells
After you finish developing the Notebook cells, you can test all cells or run a single cell.
Run all cells: After editing the Notebook, click the
icon at the top to test and run all cells in the Notebook node.Run a single cell: After editing a cell within the Notebook, click the
icon to the left of the cell to test and run it.
5. View the results
SQL cell
You can write various types of SQL scripts in a cell. After you run a SQL script, the results are printed below the cell.
Scenario 1: If the SQL does not contain a SELECT statement, only the run log is displayed by default after the cell is executed.
CREATE TABLE IF NOT EXISTS product ( product_id BIGINT, product_name STRING, product_type STRING, price DECIMAL(10, 2) ) LIFECYCLE 30; -- The data lifecycle is 30 days. Data is automatically deleted after this period. This setting is optional.Scenario 2: If the SQL contains a SELECT statement, the run log is displayed, and the results can be viewed in two ways: as a table or as a visual chart. The system also automatically generates a DataFrame variable from the query results.
SELECT product_id, product_name, product_type, price FROM product;Generate a DataFrame data object:
The SQL cell automatically generates a return variable. You can click the
df_*variable name in the lower-left corner of the SQL cell to rename the generated DataFrame variable.
View the SQL query table:After the SQL query runs, the results are displayed in a table by default in the log area.
The results of an SQL query are displayed in a table in the log area by default.

View the visual chart for the SQL query
After the SQL query runs, click the
icon on the left of the log area to view a visual chart of the data generated by the query.
Python cell
You can write Python scripts in a cell. After you run a Python script, the results are printed below the cell.
Scenario 1: Print only text output.
print("Hello World")Scenario 2: Use a Pandas DataFrame.
import pandas as pd # Define product data, including details: product name, region, and login frequency. product_data = { 'Product_Name': ['DataWorks', 'RDS MySQL', 'EMR Spark', 'MaxCompute'], 'Product_Region': ['East China 2 (Shanghai)', 'North China 2 (Beijing)', 'South China 1 (Shenzhen)', 'Hong Kong'], 'Login_Frequency': [33, 22, 11, 44] } # Create a DataFrame from the given data. df_products = pd.DataFrame(product_data) # Print the DataFrame to show the product information. print(df_products)
Scenario 3: Plot a chart.
import matplotlib.pyplot as plt # Data categories = ['DataWorks', 'RDS MySQL', 'MaxCompute', 'EMR Spark', 'Hologres'] values = [23, 45, 56, 78, 30] # Create a bar chart plt.figure(figsize=(10, 6)) plt.bar(categories, values, color=['blue', 'green', 'red', 'purple', 'orange']) # Add a title and labels plt.title('Example Bar Chart') plt.xlabel('category') plt.ylabel('value') # Show the chart plt.show()
Markdown cell
After you finish writing, click the
icon to display the formatted Markdown text.# DataWorks Notebook
In a Markdown cell that is already displaying formatted text, click the
icon to continue editing the cell.
What to do next: Publish the node
Configure scheduling: If a Notebook in the Project Folder needs to run on a recurring schedule in the production environment, you must configure its scheduling properties. For example, you can specify a recurring schedule time.
By default, Notebooks in the Project Folder, My Folder, or under One-Time Tasks run on the kernel of your personal development environment. When you publish a Notebook to the production environment, the system uses the image environment that you selected in the scheduling configuration. Before you publish the Notebook, ensure that the selected image contains the necessary dependencies for the Notebook node to run. You can create a DataWorks image from a personal development environment to use for scheduling.
Publish the node: A Notebook node runs according to its scheduling configuration only after it is published to the production environment. You can publish a node to the production environment in the following ways.
Publish a Notebook from the Project Folder: Save the Notebook, and then click
to publish it. After publishing, you can view the Notebook task on the page in the Operation Center.Publish a Notebook from My Folder: Save the Notebook. Click the
icon to submit the Notebook from My Folder to the Project Folder. Then, click
to publish the Notebook. After publishing, you can view the Notebook task on the page in the Operation Center.Publish a Notebook from One-Time Tasks: Save the Notebook, and then click
to publish it. After publishing, you can view the Notebook task on the page in the Operation Center.
Unpublish a task: To unpublish a Notebook, right-click the node, select Delete, and follow the on-screen instructions to unpublish or delete the Notebook.
Scenarios and practices
Use built-in Magic Commands to connect to a MaxCompute computing resource
In a Python cell, you can use built-in Magic Commands to connect to a MaxCompute computing resource. This avoids the need to repeatedly define connection information and plaintext AccessKey information in Python.
Before you connect to a MaxCompute computing resource, ensure that you have attached a MaxCompute (ODPS) computing resource.
Scenario 1: Establish a MaxCompute MaxFrame Session connection
When developing in a Python cell, you can use the following built-in Magic Command to open the MaxCompute computing resource selector and access the MaxCompute MaxFrame service.
Use a Magic Command to connect to and access a MaxCompute MaxFrame Session.
mf_session = %maxframeUse a Magic Command in a Python cell to release the MaxCompute MaxFrame connection:
mf_session.destroy()
Scenario 2: Connect to a MaxCompute computing resource
When developing in a Python cell, you can use the following built-in Magic Command to open the MaxCompute computing resource selector. This lets you interact with MaxCompute using Python for operations such as data loading, queries, and DDL operations.
Use a Magic Command to create a MaxCompute connection.
Entering the following command in a cell opens the MaxCompute computing resource selector.
o=%odpsUse the obtained MaxCompute computing resource to run a PyODPS script.
For example, to retrieve all tables in the current project:
with o.execute_sql('show tables').open_reader() as reader: print(reader.raw)
Write data from a dataset to a MaxCompute table
DataWorks supports creating NAS-type datasets. You can then use the dataset in Notebook development to read and write data in NAS storage.
The following example shows how to write test data (testfile.csv) from a dataset attached to a personal development environment instance (mount path: /mnt/data/dataset02) to a MaxCompute table (mc_testtb).

Pass SQL cell results to a Python cell
When a SQL cell produces output, a DataFrame variable is automatically generated. This variable can be accessed by a Python cell, enabling interaction between SQL and Python cells.
Run the SQL cell to generate a DataFrame.
If the SQL cell contains one query, the result of that query is automatically captured as a DataFrame variable.
If the SQL cell contains multiple queries, the DataFrame variable will be the result of the last query.
NoteThe DataFrame variable name defaults to
df_**. You can click the variable name in the lower-left corner of the cell to customize it.If a cell contains multiple SQL queries, the DataFrame variable will only store the result of the last executed query.
Retrieve the DataFrame variable in a Python cell.
In a Python cell, you can retrieve the DataFrame variable by directly referencing its name.

Reference a Python resource in a Notebook
During Notebook development, you can reference a MaxCompute resource using the format ##@resource_reference{"custom_name.py"}. The following is a simple example of how to reference a Python resource:
Referencing a Python resource in a Notebook only works in the production environment. It does not work in the development environment. You must publish the Notebook to the production environment and execute it in the Operation Center.
Create a new Python resource
Add a Python resource file.
Go to the DataWorks Workspaces page. In the top navigation bar, switch to the destination region. Find the created workspace and click in the Actions column to go to DataStudio.
In the navigation pane on the left, click
to go to Resource Management.On the Resource Management page, click the New button or
. You can also first Create a Folder to organize your resources, and then right-click the folder and choose New to select the specific resource or function type to create.Create a MaxCompute Python resource.
In this example, the Python resource is named
hello.py.
Edit the content of the Python resource file. The following is sample code:
# your_script.py def greet(name): print(f"Hello, {name}!")After editing, click Save to save the Python code.
After you edit and save the code, click the
icon to commit the hello.pyresource.After the resource is committed, click the
icon to publish the hello.pyresource to the development and production environments.
Reference the Python resource
Add a Notebook node. For more information, see Create a Notebook node.
Add a Python cell to the Notebook. For more information, see Add a cell.
In the Python cell, write
##@resource_reference{"hello.py"}to reference the new MaxCompute Python resource. The following is sample code:# This comment references a Python resource named hello.py during scheduling. ##@resource_reference{"hello.py"} import sys import os # Add the current working directory to the path. sys.path.append(os.path.abspath('./hello.py')) # Or use a relative path, adjust as needed. from hello import greet # Replace with the actual function name. greet('DataWorks')After you write the code in the Python cell and configure the node scheduling, save and publish the Notebook node.
Go to the Operation Center (Workflow). On the page, find the published Notebook node. In the Actions column, click Backfill Data to perform a data backfill for the Notebook node. For more information about data backfill, see Perform data backfill and view the data backfill instance (new version).
After the data backfill is complete, you can view the run log of the Notebook node to confirm whether the Python cell was executed successfully.
Reference workspace parameters in a Notebook
During Notebook development, you can reference workspace parameters in SQL and Python cells using the format ${workspace.param}. The following is a simple example of how to reference a workspace parameter.
Before you reference a workspace parameter in a cell, you must create the workspace parameter.
In the example,
paramis the name of the workspace parameter you created. Replace it with the name of your desired workspace parameter during development.
Reference a workspace parameter in a SQL cell.
SELECT '${workspace.param}';This queries the workspace parameter. After a successful run, the specific value of the workspace parameter is printed.
Reference a workspace parameter in a Python cell.
print('${workspace.param}')This outputs the workspace parameter. After a successful run, the specific value of the workspace parameter is printed.
Use PySpark with Magic Commands
During Notebook development, you can use Magic Commands in a Python cell to quickly create and start a Livy service. This connects to MaxCompute Spark and EMR Serverless Spark computing resources for efficient development and debugging.
Scope:
MaxCompute computing resources and EMR Serverless Spark computing resources.
Personal development environment instances created before
2025-08-01do not support this feature. To use this feature, you need to create a new personal development environment.
Prerequisites: You have attached a MaxCompute computing resource or an EMR Serverless Spark computing resource to your workspace.
Connect to a computing resource using Python
In a Notebook's Python cell, you can use the following commands to quickly create, connect to, or release a Livy service on the target computing resource.
MaxCompute commands
Magic command | Description | Notes |
| Running this command performs the following operations:
Note You cannot view Livy and Spark Session information in the MaxCompute console. | Running a Notebook in DataStudio: When you run a Notebook in DataStudio, you must select the name of a personal development environment instance. The first time you run this command in a Notebook within the selected instance, a new Livy service is created. If the Livy service is not deleted, subsequent runs of the Running a Notebook after publishing to production: When a Notebook runs in the production environment, each task instance creates a new Livy service. The Livy service is automatically stopped and deleted when the task instance finishes running. |
| Running this command cleans up the Spark Session and stops the Livy service. | To publish the Notebook task to the production environment, the task code does not need to include this Magic Command. |
| Running this command deletes the Livy service. | When a Notebook task instance runs in the production environment, the system automatically appends the Note The system-appended |
EMR Serverless Spark commands
Magic command | Description | Notes |
| Running this command performs the following operations:
Note: Note After you run the command, you can go to the E-MapReduce console to view and manage the Livy Gateway and Spark Session of the EMR Serverless Spark engine. A Livy service created through a DataWorks Notebook has a name prefixed with | Running a Notebook in DataStudio: When you run a Notebook in DataStudio, you must select the name of a personal development environment instance. The first time you run this command in a Notebook within the selected instance, a new Livy service is created. If the Livy service is not deleted, subsequent runs of the Running a Notebook after publishing to production: When a Notebook runs in the production environment, each task instance creates a new Livy service. The Livy service is automatically stopped and deleted when the task instance finishes running. |
| Running this command cleans up the Spark Session and stops the Livy service. | To publish the Notebook task to the production environment, the task code does not need to include this Magic Command. |
| Running this command deletes the Livy service. | When a Notebook task instance runs in the production environment, the system automatically appends the Note The system-appended |
Submit and execute Spark code using Python
You can add a Python cell in a Notebook to edit and execute PySpark code.
Ensure that you are connected to the target computing resource. In a preceding Python cell, you must have already used a Magic Command (such as
%emr_serverless_sparkor%maxcompute_spark) to connect to the target computing resource. For more information, see Connect to a computing resource using Python.Write PySpark code.
In a new Python cell, add the
%%sparkcommand to use the Spark computing resource connected in the previous step, and then edit your PySpark code. For example:%%spark spark.sql("DROP TABLE IF EXISTS dwd_user_info_d") spark.sql("CREATE TABLE dwd_user_info_d(id STRING, name STRING, age BIGINT, city STRING)") spark.sql("INSERT INTO dwd_user_info_d SELECT '001', 'Jack', 30, 'Beijing'") spark.sql("SELECT * FROM dwd_user_info_d").show() spark.sql("SELECT COUNT(*) FROM dwd_user_info_d").show()NoteIf a Python cell includes the
%%sparkcommand, it can connect to and run on the target computing resource's Spark engine.If a Python cell does not include the
%%sparkcommand, it can only run on the local environment.
Appendix: General operations
DataWorks Notebook operations are based on VSCode's Jupyter Notebook. The following are some general operations for cells:

icon below the cell to quickly add more tags.
icon to view all variable parameters in the Notebook. This includes the Name, Type, Size, and Value of the variables.