我在一个单独的文件中定义了我的管道model.py
class TextSelector(BaseEstimator, TransformerMixin):
def __init__(self, field):
self.field = field
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.field]
class NumberSelector(BaseEstimator, TransformerMixin):
def __init__(self, field):
self.field = field
def fit(self, X, y=None):
return self
def transform(self, X):
return X[[self.field]]
text_features = Pipeline([
('selector', TextSelector(field='text')),
('vectorizer', TfidfVectorizer(min_df=5, max_df=0.25, ngram_range=(1, 1))),
('decomposer', TruncatedSVD(n_components=300))
])
features = FeatureUnion([
('text_features', text_features),
('other_feature', NumberSelector(field='other')),
])
pipeline = Pipeline([
('features', features),
('lgbm', LGBMClassifier(max_depth=-1, n_estimators=300,
learning_rate=0.1, n_jobs=2,
class_weight='balanced'))
])
训练和转储模型
from model import pipeline
clf = pipeline.fit(X, y)
joblib.dump(clf, 'model.joblib')
为了加载模型,脚本需要访问model.py
. 使用 google ml 引擎时我应该把这个文件放在哪里?
我试过了
gcloud ml-engine local predict --model-dir=/path/to/models --json-instances=input.json --framework=SCIKIT_LEARN
与model.py
内部path/to/models
目录。
错误
cloud.ml.prediction.prediction_utils.PredictionError:加载模型失败:无法加载模型:/path/to/the/model/model.joblib。没有名为“模型”的模块。(错误代码:0)
另一个问题是是否可以lightgbm
在 ml-engine 预测中使用它?