azure - 如何使用 python 正确地将数据输出到 Azure ML Batch Endpoint？

Question

调用 Azure ML Batch Endpoints（为推理创建作业）时，run() 方法应返回 pandas DataFrame 或数组，如此处所述

但是，此示例显示的并不代表带有 csv 标题的输出，因为它经常需要。

我尝试的第一件事是将数据作为pandas DataFrame返回，结果只是一个简单的 csv，只有一列，没有标题。

当尝试通过几列传递值及其相应的标题时，稍后将其保存为 csv，结果，我得到了尴尬的方括号（表示 python 中的列表）和撇号（表示字符串）

我无法在其他地方找到文档来解决这个问题：

score 2 · Accepted Answer

这是我发现使用 python 从 AzureML 中的批处理端点调用创建 csv 格式的干净输出的方式：

def run(mini_batch):
    batch = []
    for file_path in mini_batch:
        df = pd.read_csv(file_path)
        
        # Do any data quality verification here:
        if 'id' not in df.columns:
            logger.error("ERROR: CSV file uploaded without id column")
            return None
        else:
            df['id'] = df['id'].astype(str)

        # Now we need to create the predictions, with previously loaded model in init():
        df['prediction'] = model.predict(df)
        # or alternative, df[MULTILABEL_LIST] = model.predict(df)

        batch.append(df)

    batch_df = pd.concat(batch)

    # After joining all data, we create the columns headers as a string,
    # here we remove the square brackets and apostrophes:
    azureml_columns = str(batch_df.columns.tolist())[1:-1].replace('\'','')
    result = []
    result.append(azureml_columns)

    # Now we have to parse all values as strings, row by row, 
    # adding a comma between each value
    for row in batch_df.iterrows():
        azureml_row = str(row[1].values).replace(' ', ',')[1:-1].replace('\'','').replace('\n','')
        result.append(azureml_row)

    logger.info("Finished Run")
    return result

azure - 如何使用 python 正确地将数据输出到 Azure ML Batch Endpoint？

1 回答 1

Related

Reference