python - Pandas - 制作列 dtype 对象或因子

Question

在 pandas 中，如何将 DataFrame 的列转换为 dtype 对象？或者更好的是，成为一个因素？（对于那些说 R 语言的人，在 Python 中，我该怎么做as.factor()？）

pandas.Factor另外，和有什么区别pandas.Categorical？

score 83 · Accepted Answer

您可以使用该astype方法来转换系列（一列）：

df['col_name'] = df['col_name'].astype(object)

或整个 DataFrame：

df = df.astype(object)

更新

从 0.15 版开始，您可以在系列/列中使用类别数据类型：

df['col_name'] = df['col_name'].astype('category')

注意：pd.Factor已被弃用并已被删除以支持pd.Categorical.

score 17 · Accepted Answer

There's also pd.factorize function to use:

# use the df data from @herrfz

In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]

In [153]: df
Out[153]: 
   a       b  c
0  1     yes  0
1  2      no  1
2  3     yes  0
3  4      no  1
4  5  absent  2

score 12 · Accepted Answer

Factor据我所知，并且Categorical是相同的。我认为它最初被称为因子，然后改为分类。要转换为分类也许你可以使用pandas.Categorical.from_array，像这样：

In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})

In [28]: df
Out[28]: 
   a       b
0  1     yes
1  2      no
2  3     yes
3  4      no
4  5  absent

In [29]: df['c'] = pd.Categorical.from_array(df.b).labels

In [30]: df
Out[30]: 
   a       b  c
0  1     yes  2
1  2      no  1
2  3     yes  2
3  4      no  1
4  5  absent  0

python - Pandas - 制作列 dtype 对象或因子

3 回答 3

更新

Related

Reference