python-3.x - 为什么 df[0] 在连接后没有返回第一列以及如何使用 ohe.categories_？

Question

dfcars= pd.read_excel('cars.xlsx')
ohe=OneHotEncoder()
temp1= pd.DataFrame(ohe.fit_transform(dfcars[['Car Model']]).toarray())
ohe.categories_
dfcars = pd.concat([dfcars,temp1], axis=1)

飞车

dfcars.columns

虽然dfcars已与连接temp，但dfcars[0]返回第 4 列并dfcars[4] 显示错误。

dfcars[0]

为什么会这样？

另外，我到处搜索，但找不到如何使用categories_，OneHotEncoder所以请告诉它的作用以及使用的正确语法。

我试过了，但可能由于上述问题，起始列消失了，所有值都用 NaN 填充。

df_drop=dfcars.drop(['Car Model'],axis=1)
df_drop = pd.DataFrame(data= df_drop, columns= ohe.categories_)

df_drop

score 0 · Accepted Answer

这可能是因为 dfcars[0] 语法是 df[column_name_string] 并且您有一个名称为“0”的列，但您没有名称为“4”的列。您可以在连接之前重命名列：

temp1= pd.DataFrame(ohe.fit_transform(dfcars[['Car Model']]).toarray(),columns=['Category_0', 'Category_1', 'Category_2'])
dfcars = pd.concat([dfcars,temp1], axis=1)

对于categories_的属性OneHotEncoder，您可以访问sklearn 文档。

python-3.x - 为什么 df[0] 在连接后没有返回第一列以及如何使用 ohe.categories_？

1 回答 1

Related

Reference