我的训练数据包含None
值,我用 imputer 构建了一个 sklearn 管道来处理它。然后我在启用 MLflow 跟踪的情况下训练管道模型:
transformers = [
("numerical", SimpleImputer(strategy="mean"), ["foo", "bar"]),
......
]
preprocessor = ColumnTransformer([
transformers, remainder="passthrough", sparse_threshold=0)
model = Pipeline([
("preprocessor", preprocessor),
("classifier", DecisionTreeClassifier(...)),
])
with mlflow.start_run(run_name="my_run") as mlflow_run:
model.fit(X_train, y_train)
mlflow.sklearn.eval_and_log_metrics(model, X_val, y_val)
经过训练的模型本身可以predict
毫无问题
model.predict(X_train) # ==> OK
但是pyfunc
我从 MLflow 得到的对象不起作用:
model_pyfunc = mlflow.pyfunc.load_model(
'runs:/{run_id}/model'.format(
run_id=mlflow_run.info.run_id
)
)
model_pyfunc.predict(X_train)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<command-4119235480823764> in <module>
5 )
6
----> 7 model_pyfunc.predict(X_train)
/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py in predict(self, data)
594 if input_schema is not None:
595 data = _enforce_schema(data, input_schema)
--> 596 return self._model_impl.predict(data)
597
598 @property
/databricks/python/lib/python3.8/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
118
119 # lambda, but not partial, allows help() to work with update_wrapper
--> 120 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
121 # update the docstring of the returned function
122 update_wrapper(out, self.fn)
这仅发生在具有None
值的数据上。X_train
如果我对without的子集进行推理None
,该model_pyfunc.predict
函数也可以工作。这是一个已知的问题?关于如何修复它的任何建议?