python - 使用 Scikit-learn 顺序编码数据时出错

Question

我正在遍历我的数据帧的行，并尝试使用我的编码器对每一行数据进行编码。

for index, row in self.data.iterrows():
    data = self._encoder.transform(row)
    try:
        print(row.shape)
        results["classes"].append((self._model.predict(data) > 0.5).astype("int32"))
        results["probability"].append((self._model.predict(data)))
        results["rows"].append(index)
    except Exception as e:
        print(e)
        results["rows"].append(index)
        results["classes"].append("ERROR")
        results["probability"].append("ERROR")

然后使用我的模型进行预测。编码器和模型都是用 Scikit-learn 和 Keras 制作的，使用 keras 的内置保存功能保存模型，并将编码器导出到 joblib 文件。如果我对整个数据帧进行编码，一切都会按预期工作。

我试图按顺序执行此操作，以避免当编码器引发有关数据的错误时我的程序可能发生故障，特别是当一个新值出现在我是一个热编码的列之一时，编码器没有看到的值前。

我尝试过使用iterrows()，当我尝试对每一行进行编码时，出现以下错误。 IndexError: tuple index out of range.

我还尝试将每一行转换为自己的数据框，但是当我尝试编码时得到以下信息ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 67 and input n_features is 1.

循环遍历我的数据并按顺序对每行数据进行编码和预测的最佳方法是什么？

第二个错误的完整跟踪

Traceback (most recent call last):
  File "/home/build/x-predictive-model/main.py", line 18, in <module>
    network.predictSequentially()
  File "/home/build/x-predictive-model/myai.py", line 191, in predictSequentially
    encoded = self._encoded_data = self._encoder.transform(pd.DataFrame(row))
  File "/home/user1/anaconda3/envs/x-model-lib/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 571, in transform
    .format(self._n_features, X.shape[1]))
ValueError: Number of features of the input must be equal to or greater than that of the fitted transformer. Transformer n_features is 67 and input n_features is 1.

score 0 · Accepted Answer

可以使用索引进行循环和预测，使用self.data[index:index+1]可以循环遍历数据并进行预测的语法。

python - 使用 Scikit-learn 顺序编码数据时出错

1 回答 1

Related