我正在运行一个非常基本的代码来创建编码器类,然后使用相同的类来编码一个新的数据帧。在这段代码中,我不需要使用np.save
and np.load
,但是在我的实际实现中,我需要重新加载编码器来转换一个新的数据帧。我试图了解如何在一个数据帧上创建一个编码器类,然后在另一个脚本中加载该编码器并转换一个新的数据帧。
from sklearn.preprocessing import LabelEncoder
import pickle as cPickle
import numpy as np
df_test = pd.DataFrame({'A': [1, 2, 3, 4],
'B': ["Yes", "No", "Yes", "Yes"],
'C': ["Yes", "No", "No", "Yes"],
'D': ["No", "Yes", "No", "Yes"]})
le = LabelEncoder()
df_test['A'] = le.fit_transform(df_test['A'])
le_dict = dict(zip(le.classes_, le.transform(le.classes_)))
class_name = 'classes_' +'A' + '.npy'
np.save(class_name, le_dict, allow_pickle=True)
print(df_test)
print(le.classes_)
le.classes_ = np.load('classes_A.npy', allow_pickle = True)
print(le.classes_)
df_new = pd.DataFrame({'A': [1, 2, 3, 4],
'B': ["Yes", "No", "Yes", "Yes"],
'C': ["Yes", "No", "No", "Yes"],
'D': ["No", "Yes", "No", "Yes"]})
df_new['A'] = le.transform(df_new['A'])
这给了我以下错误:
File "<ipython-input-42-a1aa630ec7e8>", line 1, in <module>
df_new['A'] = le.transform(df_new['A'])
File "/Users/usr/opt/anaconda3/envs/signals_gcp_py36/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 257, in transform
_, y = _encode(y, uniques=self.classes_, encode=True)
File "/Users/usr/opt/anaconda3/envs/signals_gcp_py36/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 110, in _encode
return _encode_numpy(values, uniques, encode)
File "/Users/usr/opt/anaconda3/envs/signals_gcp_py36/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 49, in _encode_numpy
% str(diff))
ValueError: y contains previously unseen labels: [1, 2, 3, 4]
当我在从内存加载之前打印 le.classes_ 时,它是这样的:
array([1, 2, 3, 4])
但是当我在 np.load() 之后打印它时,它是这样的:
{1: 0, 2: 1, 3: 2, 4: 3}
以下是有关 le.classes after 的更多信息np.load()
:
In []: le.classes_
Out[]: array({1: 0, 2: 1, 3: 2, 4: 3}, dtype=object)
In []: type(le.classes_)
Out[]: numpy.ndarray
In []: print(le.classes_)
Out[]: {1: 0, 2: 1, 3: 2, 4: 3}
我试图了解这些功能是如何工作的。我运行了相同的代码,但是对于 col.B,我又遇到了另一个错误。
from sklearn.preprocessing import LabelEncoder
import pickle as cPickle
import numpy as np
df_test = pd.DataFrame({'A': [1, 2, 3, 4],
'B': ["Yes", "No", "Yes", "Yes"],
'C': ["Yes", "No", "No", "Yes"],
'D': ["No", "Yes", "No", "Yes"]})
le = LabelEncoder()
df_test['B'] = le.fit_transform(df_test['B'])
le_dict = dict(zip(le.classes_, le.transform(le.classes_)))
class_name = 'classes_' +'B' + '.npy'
np.save(class_name, le_dict, allow_pickle=True)
print(df_test)
print(le.classes_)
le.classes_ = np.load('classes_B.npy', allow_pickle = True)
print(le.classes_)
df_new = pd.DataFrame({'A': [1, 2, 3, 4],
'B': ["Yes", "No", "Yes", "Yes"],
'C': ["Yes", "No", "No", "Yes"],
'D': ["No", "Yes", "No", "Yes"]})
df_new['B'] = le.transform(df_new['B'])
这个错误是TypeError: argument must be a string or number
。
这是完整的堆栈:
Traceback (most recent call last):
File "<ipython-input-71-a0243d411c34>", line 1, in <module>
df_new['B'] = le.transform(df_new['B'])
File "/Users/usr/opt/anaconda3/envs/signals_gcp_py36/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 257, in transform
_, y = _encode(y, uniques=self.classes_, encode=True)
File "/Users/usr/opt/anaconda3/envs/signals_gcp_py36/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 107, in _encode
raise TypeError("argument must be a string or number")
TypeError: argument must be a string or number
任何帮助表示赞赏!