All Products
Search
Document Center

DataWorks:Use third-party packages and custom Python scripts in PyODPS nodes

Last Updated:Sep 28, 2025

When PyODPS built-in libraries don't meet your needs, you can extend functionality with third-party packages or custom Python scripts. This topic shows how to install third-party packages using custom images and O&M Assistant, and how to import a Python script.

Choose your integration approach

Use case

Resource group type

Solution

Using third-party packages

Serverless resource group

Install third-party packages using a custom image

Exclusive resource group for scheduling

Install third-party packages using O&M Assistant

Using custom .py script files

Serverless resource group or exclusive resource group for scheduling

Create a Python script

The following figure shows the main process for each solution.

image

Preparations

Before you start, understand the following two key concepts to determine your configuration method.

  1. PyODPS 3 vs. PyODPS 2

    • PyODPS 3 (recommended): Based on the Python 3.7+ environment. Official support for Python 2 has ended. This topic uses PyODPS 3 as the primary example.

    • PyODPS 2: Based on the Python 2.7 environment.

  2. Resource groups: serverless resource groups vs. exclusive resource group for scheduling

    • Serverless resource group (recommended): Elastic and maintenance-free, reliant on custom images to manage dependencies.

    • Exclusive resource group for scheduling: Requires manual maintenance of the resource group and installs dependencies through O&M Assistant, which has more limitations.

    Note

    In the DataWorks console, go to the Resource Group page in workspace details to view the type of resource group bound to your workspace.

    • General-purpose Type indicates you are using a serverless resource group.

    • Data Scheduling indicates you are using an exclusive resource group for scheduling.

Install third-party packages using a custom image

Note

This method applies to serverless resource groups.

This tutorial provides an end-to-end example that shows how to create a custom environment containing the pendulum package and call it in a PyODPS 3 node to get and format the current time in a specific time zone.

Step 1: Create a custom image that contains pendulum

  1. Log on to the DataWorks console and go to Image Management.

  2. Select Custom Images.

  3. Click Create Image and configure the following key parameters:

    Parameter

    Description

    Image Name

    Example: pyodps3_with_pendulum.

    Reference Type

    Select DataWorks Official Image.

    Image Name/ID

    Select dataworks_pyodps_task_pod.

    Supported Task Types

    Select PyODPS 3.

    Installation Package

    SelecPython3 and then select the pendulum package from the drop-down list.

    For more information about installation commands, see Appendix: Installation command reference.
  4. Click OK.

  5. On the Custom Images page, test and publish the image.

  6. In the Actions column of the image, click image > Change Workspace and then bind the custom image to the target workspace.

    image

Step 2: Create and configure a PyODPS 3 node

  1. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace from the drop-down list, and click Go to Data Development.

  2. Create a new PyODPS 3 node and name it, for example, pyodps_pendulum_test.

  3. In the editor, write the code:

    # Import pendulum (already installed in the custom image)
    import pendulum
    print("Starting to test the third-party package pendulum...")
    try:
        # Use pendulum to get the current time in the Asia/Shanghai time zone.
        shanghai_time = pendulum.now('Asia/Shanghai')
        # Print the formatted time and time zone information.
        print(f"Successfully imported the 'pendulum' package.")
        print(f"The current time in Shanghai is: {shanghai_time.to_datetime_string()}")
        print(f"The corresponding time zone is: {shanghai_time.timezone_name}")
    
        print("\nTest passed! The PyODPS node successfully called the third-party package.")
    except Exception as e:
        print(f"Test failed. An error occurred: {e}")

Step 3: Verify the result

  1. Click the ** icon in the toolbar to run the code. In the Parameters dialog box, select the image you created, pyodps3_with_pendulum.

    Important

    If you cannot find the target image, bind the image to the current workspace.

  2. Review the run logs in the bottom panel. The following output indicates that the pendulum package was successfully called:

    Starting to test the third-party package pendulum...
    Successfully imported the 'pendulum' package.
    The current time in Shanghai is: 2025-09-27 15:45:00
    The corresponding time zone is: Asia/Shanghai
    Test passed! The PyODPS node successfully called the third-party package.

Step 4: Publish the PyODPS 3 node

After completing the test, go to Properties > Resource Group. Select the prepared serverless resource group and change the image to your custom image, pyodps3_with_pendulum. Then, publish the node to Operation Center.

Install third-party packages using O&M Assistant

Important

This method is for exclusive resource groups for scheduling which are no longer recommended.

  1. Log on to the DataWorks Workspaces page, switch the region at the top, find the target workspace, and click Details in the Actions column.

  2. In the left-side navigation pane, click Resource Group, find the bound exclusive resource group for scheduling. In the Actions column, click the image icon and select O&M Assistant.

  3. Select Create Command in the upper-left corner.

  4. For Python 3 (PyODPS 3), keep the other default options. From the drop-down list, select the pendulum installation package.

  5. On the O&M Assistant page, click Run Command in the Actions column. Once the command completes, you can use import pendulum directly in the corresponding PyODPS node.

Create a Python script

If you only want to call a function from another .py file that you wrote, follow these steps:

  1. Create a Python resource:

    1. On the DataStudio page, right-click the Business Flow and choose Create Resource > MaxCompute > Python.

    2. In the Create Resource dialog box, enter a name for the resource (for example: my_utils.py) and click Create.

    3. Enter the following code:

      def say_hello(name):
          print(f"Hello, {name}! This is from my_utils module.")
    4. Save and submit the resource.

  2. Create a PyODPS 3 node and reference the Python resource:

    • In the target business flow, right-click MaxCompute, select Create Node > PyODPS 3 and create the node.

    • In the code editor, reference the Python resource using ##@resource_reference{"my_utils.py"} as shown in the following code:

      ##@resource_reference{"my_utils.py"}
      import sys
      import os
      # Add the current directory of the resource to the Python interpreter's search path
      sys.path.append(os.path.dirname(os.path.abspath('my_utils.py')))
      # Import and use it like a normal module
      import my_utils
      my_utils.say_hello("DataWorks")
  3. Run the node. You will see "Hello, DataWorks! This is from my_utils module." in the run logs.

FAQ

Q: Why does my custom image test hang during package installation?

A: This issue typically occurs due to network connectivity problems. Try these solutions:

  • If your task environment requires third-party packages from the Internet, ensure the VPC bound to your serverless resource group has Internet access. For more information, see Enable public network access for the resource group.

  • Try switching to a different Python package source, such as https://mirrors.aliyun.com/pypi/simple/.

Q: What should I do if importing a third-party package fails?

A: Troubleshoot as follows:

  1. Confirm that the custom image has been published successfully.

  2. Confirm that the task type supported by the image (PyODPS 2/3) matches the type of node you created.

  3. Confirm that the custom image is correctly selected in the Properties of the PyODPS node.

    You cannot select shared resource groups.
  4. Check if the installed package version is compatible with your Python version (for example, pendulum 2.0+ does not support Python 2).

References

Appendix: Installation command reference

If you use the Script mode for a custom image or the Manual Installation mode in O&M Assistant to configure installation commands, refer to the following commands for installation:

  • If you use a PyODPS 2 node, run the following command.

    pip install <package_to_install>
    Note

    After running the command, if prompted to upgrade the PIP version, run the command pip install --upgrade pip.

  • If you use a PyODPS 3 node, run the following command.

    /home/tops/bin/pip3 install <package_to_install>
    Note
    • After running the command, if prompted to upgrade the PIP version, run the command /home/tops/bin/pip3 install --upgrade pip.

    • If the error /home/admin/usertools/tools/cmd-0.sh:line 3: /home/tops/bin/python3: No such file or directory occurs, submit a ticket to request permissions.