python - 如何将 cuDF 数据框传递给 cuML.ensemble.RandomForestClassifier？

Question

我正在尝试将数据拟合到，但cuml.ensemble.RandomForestClassifier我不断收到错误消息：“标签需要是从 0 到唯一标签值数量的连续值”

我将 cudf.DataFrame 对象传递给具有相同行数但列数不同的函数。列标签从 0 开始，逐步增加 1 直到最后一列（在下面的示例中为 108）。我究竟做错了什么？我附上了我在下面传递的数据帧的打印输出和一些上下文代码：

clf1 = modelClass(max_depth=D1, random_state=random.randrange(0, 1024, 1), n_bins=15, n_streams=4, split_criterion=criterion, bootstrap=bootstrap, n_estimators=trs1)

clf1.fit(X1, Y1)

X1 的数据框如下所示：

	0	1	2
0	1.000000e-11	1.000000e-11	1.647421e-01
1	1.000000e-11	1.000000e-11	1.760000e-02
2	1.000000e-11	1.000000e-11	-1.772000e-01
3	1.000000e-11	1.000000e-11	8.254000e-01
4	1.000000e-11	1.000000e-11	2.587000e-01
...	...	...	...
5402	1.000000e-11	1.000000e-11	1.704444e-01
5403	1.000000e-11	1.000000e-11	-1.860000e-01
5404	0.000000e+00	1.000000e-11	1.229714e-01
5405	1.000000e-11	1.959500e-01	1.984667e-01
5406	1.000000e-11	1.000000e-11	1.000000e-11

[5407 行 x 3 列]；dtype=('0', dtype('float64')); <cudf.core.dataframe._DataFrameLocIndexer 对象位于 0x7f9c3d0f3070>

Y1 的 Dataframe 如下所示：

	0
0	-2.0
1	4.0
2	-3.0
3	1.0
4	0.0
...	...
5402	0.0
5403	-2.0
5404	0.0
5405	0.0
5406	0.0

[5407 行 x 1 列]；dtype=('0', dtype('float64')); <cudf.core.dataframe._DataFrameLocIndexer 对象位于 0x7f9c1b847b50>

系统信息：Ubuntu 20.04、Titan RTX、CUDA 11.5、Rapids 21.12 内置Conda、Python 3.8

score 0 · Accepted Answer

最后，您需要先对 Y1 的 Dataframe 进行编码，然后再通过它：

enc = cuml.preprocessing.LabelEncoder()

Y1 = enc.fit_transform(Y1)

向@beckernick 大喊帮助我解决这个问题！

python - 如何将 cuDF 数据框传递给 cuML.ensemble.RandomForestClassifier？

1 回答 1

Related

Reference