3

我有一个使用 Kubeflow Pipeline 的 Google Cloud Platform 帐户。管道的第一个组件预处理一些数据,第二个组件使用该预处理数据训练模型(SKlearn 决策树分类器)。为了展示代码示例,下面的示例是对管道第二个组件的简单修改:

import logging
import pandas as pd
import os
import numpy as np
from sklearn.tree import DecisionTreeClassifier 
from sklearn import metrics, datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data
y = iris.target

x_train_data, x_test_data, y_train_data, y_test_data = train_test_split(X, y, test_size=0.3, random_state=1, shuffle=True)

print("Creating model")
model = DecisionTreeClassifier()

print(f"Training model ({type(model)})")
model.fit(x_train_data, y_train_data)

print("Evaluating model")
y_train_pred = model.predict(x_train_data)
print("y_train_pred: ", y_train_pred.shape)

y_test_pred = model.predict(x_test_data)
print("y_test_pred: ", y_test_pred.shape)    

train_accuracy = metrics.accuracy_score(y_train_data, y_train_pred)
train_classification_report = metrics.classification_report(y_train_data, y_train_pred)

print("\nTraining result:")
print(f"Accuracy:\t{train_accuracy}")
print(f"Classification report:\t{type(train_classification_report)}\n{train_classification_report}")

test_accuracy = metrics.accuracy_score(y_test_data, y_test_pred)
test_classification_report = metrics.classification_report(y_test_data, y_test_pred)

print("\nTesting result:")
print(f"Accuracy:\t{test_accuracy}")
print(f"Classification report:\t{type(test_classification_report)}\n{test_classification_report}")

print("\nDONE !\n")

在这里,我没有加载预处理数据,而是使用 IRIS Sklearn 数据集,但输出完全相同。一切似乎都按预期工作,每个打印语句都按预期出现在 Kubeflow 平台输出控制台上,但是在第二个组件完成执行后(在输出控制台上显示最后一个打印正确之后),出现错误:

Traceback (most recent call last):
  File "<string>", line 181, in <module>
  File "<string>", line 151, in _serialize_str
TypeError: Value "None" has type "<class 'NoneType'>" instead of str.

你知道为什么会这样吗?我做错了什么还是谷歌云/Kubeflow 管道问题?

提前致谢!

4

0 回答 0