我需要对我的测试集上的分类值进行编码,它会以某种方式抛出TypeError: argument must be a string or number
. 我不知道为什么会发生这种情况,因为我可以对我的火车组做到这一点。我的意思是它们是训练/测试功能集,因此它们完全相同,它们的区别当然只是行数。我不知道如何解决这个问题,我尝试为每个使用不同的 LabelEncoder,但它仍然无法修复错误。请有人帮助我。
供您参考,分类数据在训练和测试特征集中的第 8 列
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
import scipy.stats as ss
avo_sales = pd.read_csv('avocados.csv')
avo_sales.rename(columns = {'4046':'small PLU sold',
'4225':'large PLU sold',
'4770':'xlarge PLU sold'},
inplace= True)
avo_sales.columns = avo_sales.columns.str.replace(' ','')
x = np.array(avo_sales.drop(['TotalBags','Unnamed:0','year','region','Date'],1))
y = np.array(avo_sales.TotalBags)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
impC = SimpleImputer(strategy='most_frequent')
X_train[:,8] = impC.fit_transform(X_train[:,8].reshape(-1,1)).ravel()
imp = SimpleImputer(strategy='median')
X_train[:,1:8] = imp.fit_transform(X_train[:,1:8])
le = LabelEncoder()
X_train[:,8] = le.fit_transform(X_train[:,8])
X_test[:,8] = le.fit_transform(X_test[:,8])