0

我的训练数据包含None值,我用 imputer 构建了一个 sklearn 管道来处理它。然后我在启用 MLflow 跟踪的情况下训练管道模型:

transformers = [
    ("numerical", SimpleImputer(strategy="mean"), ["foo", "bar"]),
    ......
]
preprocessor = ColumnTransformer([
    transformers, remainder="passthrough", sparse_threshold=0)
model = Pipeline([
    ("preprocessor", preprocessor),
    ("classifier", DecisionTreeClassifier(...)),
])

with mlflow.start_run(run_name="my_run") as mlflow_run:
    model.fit(X_train, y_train)
    mlflow.sklearn.eval_and_log_metrics(model, X_val, y_val)

经过训练的模型本身可以predict毫无问题

model.predict(X_train)  # ==> OK

但是pyfunc我从 MLflow 得到的对象不起作用:

model_pyfunc = mlflow.pyfunc.load_model(
  'runs:/{run_id}/model'.format(
    run_id=mlflow_run.info.run_id
  )
)

model_pyfunc.predict(X_train)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<command-4119235480823764> in <module>
      5 )
      6 
----> 7 model_pyfunc.predict(X_train)

/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py in predict(self, data)
    594         if input_schema is not None:
    595             data = _enforce_schema(data, input_schema)
--> 596         return self._model_impl.predict(data)
    597 
    598     @property

/databricks/python/lib/python3.8/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

这仅发生在具有None值的数据上。X_train如果我对without的子集进行推理None,该model_pyfunc.predict函数也可以工作。这是一个已知的问题?关于如何修复它的任何建议?

4

0 回答 0