1

我有一个数据集,经过它之后我似乎没有找到任何NaN值,但不幸的是,TypeError: ufunc 'isnan' not supported当我运行代码时得到了。

数据集链接:https ://docs.google.com/spreadsheets/d/1v5HQLrCuJXWLTLoaHFa5WPqakjpglEgEMGC9TLEwbjY/edit?usp=sharing

import pandas as pd
import numpy as np

dfData = pd.read_csv('datasets/disambiguate_spam_sms.csv', encoding="latin-1")

from sklearn.model_selection import train_test_split
training_indices, validation_indices = training_indices, testing_indices = train_test_split(sms_label,
                                                                                                stratify = sms_label,
                                                                                                train_size=0.75, test_size=0.25)

training_indices.size, validation_indices.size

from tpot import TPOTClassifier
from tpot import TPOTRegressor

tpot = TPOTClassifier(generations=5, verbosity=2)

tpot.fit(sms_data.drop('label',axis=1).loc[training_indices].values,
             sms_data.loc[training_indices,'label'].values)

下面的代码是我的错误代码,它一直指向sms_data.loc[training_indices,'label'].values)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-102-3df5d5a2120f> in <module>()
      5 
      6 tpot.fit(sms_data.drop('label',axis=1).loc[training_indices].values,
----> 7          sms_data.loc[training_indices,'label'].values)

/home/emma/.local/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    658         """
    659         self._fit_init()
--> 660         features, target = self._check_dataset(features, target, sample_weight)
    661 
    662 

/home/emma/.local/lib/python3.6/site-packages/tpot/base.py in _check_dataset(self, features, target, sample_weight)
   1175         else:
   1176             if isinstance(features, np.ndarray):
-> 1177                 if np.any(np.isnan(features)):
   1178                     self._imputed = True
   1179             elif isinstance(features, DataFrame):

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
4

1 回答 1

0

首先,我不确定您使用的train_test_split是否正确。我相信你的第一个=放错了地方。

其次,我同意@Humi;您确定将权利xy价值传递给模型吗?做一个

print(
    sms_data.drop('label',axis=1).loc[training_indices].values,
    sms_data.loc[training_indices, 'label'].values
)

并验证这确实是正确的数据。

您得到的错误意味着传递的数据是错误的数据类型,即它不是数字(?),因此无法使用其上的isnan函数进行检查。

于 2020-04-09T21:24:20.023 回答