cudf 是否支持 pandas get_dummies
。在熊猫中,我可以执行以下操作;
>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
是的!
In [9]: import pandas as pd
In [10]: pdf = pd.DataFrame({"id":[1,2,3,4,5,6], "grade":['a', 'b', 'b', 'a', 'a', 'e']})
In [11]: pdf["grade"] = pdf["grade"].astype("category")
In [12]: gdf = cudf.DataFrame.from_pandas(pdf)
In [13]: cudf.get_dummies(gdf)
Out[13]:
id grade_a grade_b grade_e
0 1 1 0 0
1 2 0 1 0
2 3 0 1 0
3 4 1 0 0
4 5 1 0 0
5 6 0 0 1
虽然系列失败:
In [14]: sr = cudf.Series(list('abca')).astype("category")
In [15]: cudf.get_dummies(sr)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-ee336b3bc1cf> in <module>
----> 1 cudf.get_dummies(sr)
/datasets/bzaitlen/miniconda3/envs/cudf_dev10.1/lib/python3.7/site-packages/cudf/core/reshape.py in get_dummies(df, prefix, prefix_sep, dummy_na, columns, cats, sparse, drop_first, dtype)
295
296 if columns is None or len(columns) == 0:
--> 297 columns = df.select_dtypes(include=encode_fallback_dtypes).columns
298
299 def length_check(obj, name):
AttributeError: 'Series' object has no attribute 'select_dtypes'