0

这是我在 Google Colab 中的代码:

import cupy as cp
import numpy as np
import joblib
import dask_ml.model_selection as dcv

def ParamSelection(X, Y, nfolds):
    param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],'kernel':['linear'], 'gamma':[0.001, 0.01, 0.1, 1, 10, 100]}
    svc = svm.SVC()
    grid_search = dcv.GridSearchCV(svc, param_grid, cv = nfolds)
    grid_search.fit(X, Y)
    print(grid_search.best_params_)
    print(grid_search.best_estimator_)
    print(grid_search.best_score_)
    return grid_search.best_estimator_

svc = ParamSelection(X_train.astype(cp.int_), y_train.astype(cp.int_), 10) 

我有这个错误

TypeError                                 Traceback (most recent call last)
<ipython-input-163-56196d6a31bd> in <module>()
     15     return grid_search.best_estimator_
     16 
---> 17 svc = ParamSelection(X_train.astype(cp.int_), y_train.astype(cp.int_), 10)
     18 

9 frames
/usr/local/lib/python3.7/site-packages/cudf/core/frame.py in __array__(self, dtype)
   1677     def __array__(self, dtype=None):
   1678         raise TypeError(
-> 1679             "Implicit conversion to a host NumPy array via __array__ is not "
   1680             "allowed, To explicitly construct a GPU array, consider using "
   1681             "cupy.asarray(...)\nTo explicitly construct a "

TypeError: Implicit conversion to a host NumPy array via __array__ is not allowed, To explicitly construct a GPU array, consider using cupy.asarray(...)
To explicitly construct a host array, consider using .to_array()

对于 train_test_split 我使用函数 from : from dask_ml.model_selection import train_test_split 我真的不知道,问题出在哪里。

有什么建议么?

4

1 回答 1

1

在内部的某个地方,Dask ML 很可能调用np.asarray了一个 Cupy 数组。这种隐式导致 CPU 到 GPU 传输的方法通常是不允许的,因此会引发错误。

如果您改为将基于 CPU 的数据与 cuML 估计器一起使用,这应该可以按预期工作。

import cupy as cp
import dask_ml.model_selection as dcv
from sklearn.datasets import make_classification
from cuml import svm
​
X, y = make_classification(
    n_samples=100
)
​
def ParamSelection(X, Y, nfolds):
    param_grid = {'C': [0.001, 10, 100],'gamma':[0.001, 100]}
    svc = svm.SVC()
    grid_search = dcv.GridSearchCV(svc, param_grid, cv = nfolds)
    grid_search.fit(X, Y)
    print(grid_search.best_params_)
    print(grid_search.best_estimator_)
    print(grid_search.best_score_)
    return grid_search.best_estimator_
​
svc = ParamSelection(X, y, 2) 
{'C': 10, 'gamma': 0.001}
SVC()
0.8399999737739563
于 2021-11-01T19:44:51.803 回答