0

我想使用 sagemaker 部署用于欺诈检测的实时预测机器学习模型。

我使用 sagemaker jupyter 实例来:

-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint

对于推理步骤,我使用了一个 lambda 函数,它调用我的端点来获取每个实时事务的预测。

should i calculte again all the features for this real time transactions in lambda function ?

for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?

is there another method not to redo the calculation of the features in the inference step?
4

1 回答 1

0

我应该在 lambda 函数中再次计算此实时事务的所有功能吗?

是的,当推断训练模型(或预测实时数据)时,您应该传递与训练模型完全相同的特征列表。如果您在训练时计算某些特征(例如part of the daytimestamp),您还应该在推理时计算这些特征。

对于我使用 category_encoders 和 fit_transform() 函数将我的分类特征转换为数字特征时的特征,我应该怎么做,因为结果与训练集不同?

您应该存储用于训练模型的所有转换: numeric scalers、 categoricalencoders等。

对于 python,它看起来像这样:

import joblib # for dump fitted transformers
import category_encoders as ce

# 1. while training model
# fit encoder on historical data
encoder = ce.OneHotEncoder(cols=[...])
encoder.fit(X, y)
# and dump it
joblib.dump(encoder, 'filename.joblib') 

# 2. while inference a trained model
# load fitted encoder
encoder = joblib.load('filename.joblib')
# and apply transformation to new data
encoder.transform(X_new)
于 2021-09-07T15:45:03.907 回答