python - raise ValueError("Input contains NaN") ValueError: Input contains NaN 在尝试构建机器学习模型时

Question

我正在尝试建立一个预测模型，但目前不断收到错误：raise ValueError("Input contains NaN") ValueError: Input contains NaN. 我尝试使用np.any(np.isnan(dataframe))and np.any(np.isnan(dataframe))，但我不断收到新的错误。例如，TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''。

这是到目前为止的代码：

import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np

dataframe = pd.read_csv('file.csv', delimiter=',')

le = LabelEncoder()
dfle = dataframe

dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')

newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]

X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values

y = dfle.column3

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()

ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
# np.all(np.isfinite(dfle))
# np.any(np.isnan(dfle))
X = ohe.fit_transform(X).toarray()

score 0 · Accepted Answer

您可以先做多种事情来处理此错误，您可以将 Nan 值填充为 0dataframe = pd.read_csv('file.csv', delimiter=',').fillna(0)

或者您可以使用sklearn插补技术来填充 Nan 值。

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute

可以使用多种插补技术，但您应该使用KNNImputer.

score 0 · Accepted Answer

错误

TypeError: ufunc 'isfinite' not supported for the input types,
and the inputs could not be safely coerced to any supported types
according to the casting rule ''safe''

可能是因为你str在做col.astype(str). 改用类似的东西astype(float)。

至于NaN错误，您需要确定是否可以通过将其替换为零 ( fillna(0)) 来解决，或者是否需要使用更复杂的东西，例如卡尔曼滤波器。

python - raise ValueError("Input contains NaN") ValueError: Input contains NaN 在尝试构建机器学习模型时

2 回答 2

Related

Reference