我是 scikit-learn 的新手。我正在尝试使用预处理。OneHotEncoder 对我的训练和测试数据进行编码。编码后,我尝试使用该数据训练随机森林分类器。但是在拟合时出现以下错误。(这里是错误跟踪)
99 model.fit(X_train, y_train)
100 preds = model.predict_proba(X_cv)[:, 1]
101
C:\Python27\lib\site-packages\sklearn\ensemble\forest.pyc in fit(self, X, y, sample_weight)
288
289 # Precompute some data
--> 290 X, y = check_arrays(X, y, sparse_format="dense")
291 if (getattr(X, "dtype", None) != DTYPE or
292 X.ndim != 2 or
C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options)
200 array = array.tocsc()
201 elif sparse_format == 'dense':
--> 202 raise TypeError('A sparse matrix was passed, but dense '
203 'data is required. Use X.toarray() to '
204 'convert to a dense numpy array.')
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
我尝试使用 X.toarray() 和 X.todense() 将稀疏矩阵转换为密集矩阵,但是当我这样做时,我得到以下错误跟踪。
99 model.fit(X_train.toarray(), y_train)
100 preds = model.predict_proba(X_cv)[:, 1]
101
C:\Python27\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self)
548
549 def toarray(self):
--> 550 return self.tocoo(copy=False).toarray()
551
552 ##############################################################
C:\Python27\lib\site-packages\scipy\sparse\coo.pyc in toarray(self)
236
237 def toarray(self):
--> 238 B = np.zeros(self.shape, dtype=self.dtype)
239 M,N = self.shape
240 coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel())
ValueError: array is too big.
谁能帮我解决这个问题。
谢谢