amazon-web-services - sagemaker 中实时预测中的特征提取

Question

我想使用 sagemaker 部署用于欺诈检测的实时预测机器学习模型。

我使用 sagemaker jupyter 实例来：

-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint

对于推理步骤，我使用了一个 lambda 函数，它调用我的端点来获取每个实时事务的预测。

should i calculte again all the features for this real time transactions in lambda function ?

for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?

is there another method not to redo the calculation of the features in the inference step?

score 0 · Accepted Answer

我应该在 lambda 函数中再次计算此实时事务的所有功能吗？

是的，当推断训练模型（或预测实时数据）时，您应该传递与训练模型完全相同的特征列表。如果您在训练时计算某些特征（例如part of the day从timestamp），您还应该在推理时计算这些特征。

对于我使用 category_encoders 和 fit_transform() 函数将我的分类特征转换为数字特征时的特征，我应该怎么做，因为结果与训练集不同？

您应该存储用于训练模型的所有转换： numeric scalers、 categoricalencoders等。

对于 python，它看起来像这样：

import joblib # for dump fitted transformers
import category_encoders as ce

# 1. while training model
# fit encoder on historical data
encoder = ce.OneHotEncoder(cols=[...])
encoder.fit(X, y)
# and dump it
joblib.dump(encoder, 'filename.joblib') 

# 2. while inference a trained model
# load fitted encoder
encoder = joblib.load('filename.joblib')
# and apply transformation to new data
encoder.transform(X_new)

amazon-web-services - sagemaker 中实时预测中的特征提取

1 回答 1

Related

Reference