2

cudf 是否支持 pandas get_dummies。在熊猫中,我可以执行以下操作;

>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0
4

1 回答 1

3

是的!

In [9]: import pandas as pd

In [10]: pdf = pd.DataFrame({"id":[1,2,3,4,5,6], "grade":['a', 'b', 'b', 'a', 'a', 'e']})

In [11]: pdf["grade"] = pdf["grade"].astype("category")

In [12]: gdf = cudf.DataFrame.from_pandas(pdf)

In [13]: cudf.get_dummies(gdf)
Out[13]:
   id  grade_a  grade_b  grade_e
0   1        1        0        0
1   2        0        1        0
2   3        0        1        0
3   4        1        0        0
4   5        1        0        0
5   6        0        0        1

虽然系列失败:

In [14]: sr = cudf.Series(list('abca')).astype("category")

In [15]: cudf.get_dummies(sr)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-ee336b3bc1cf> in <module>
----> 1 cudf.get_dummies(sr)

/datasets/bzaitlen/miniconda3/envs/cudf_dev10.1/lib/python3.7/site-packages/cudf/core/reshape.py in get_dummies(df, prefix, prefix_sep, dummy_na, columns, cats, sparse, drop_first, dtype)
    295
    296     if columns is None or len(columns) == 0:
--> 297         columns = df.select_dtypes(include=encode_fallback_dtypes).columns
    298
    299     def length_check(obj, name):

AttributeError: 'Series' object has no attribute 'select_dtypes'
于 2019-11-12T16:35:09.973 回答