azureml - AzureMl 管道：如何将 step1 的数据访问到 step2

Question

我正在关注微软的这篇文章，通过两个步骤创建 azure ml 管道，并希望将 step1 写入的数据用于 step2。根据下面的文章，代码应提供 step1 写入用于 step2 的脚本的数据路径作为参数

datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
)

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input]

)

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

但是当我访问 step2.py 中的 pd 参数时，它提供了

“<azureml.data.output_dataset_config.OutputFileDatasetConfig 对象在 0x7f8ae7f478d0>> 的绑定方法 OutputFileDatasetConfig.as_mount”

知道如何传递 step1 使用的 blob 存储位置在 step2 中写入数据吗？

score 0 · Accepted Answer

您可能会在这里找到您需要的东西：https ://docs.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines 。特别要注意Read OutputFileDatasetConfig 部分作为非初始步骤的输入：

# get adls gen 2 datastore already registered with the workspace
datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", 
destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
    )

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input()]
    )

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

您的错误可能是 OutputFileDatasetConfig 有方法as_input()但没有属性。

azureml - AzureMl 管道：如何将 step1 的数据访问到 step2

1 回答 1

Related

Reference