以下是我编写的一段代码,用于使用 RFE 和估计器 LinearSVC 进行特征选择,然后使用简化的数据来拟合和预测 KNeighborClassifier。
clf = LinearSVC(C = 10, class_weight = 'auto')
rfe = RFE(estimator = clf, n_features_to_select = 700, step = 42)
rfe.fit(X, trainLabels)
reduced_train_data = rfe.transform(X)
print "reduced_train_data.shape ", reduced_train_data.shape
reduced_test_data = rfe.transform(test)
neigh = KNeighborsClassifier(n_neighbors=5, weights='distance', algorithm = 'ball_tree')
print "knn initiated"
neigh.fit(reduced_train_data, trainLabels)
print "knn fitted"
test_predict = neigh.predict(reduced_test_data)
print "knn predicted"
以下是输出:reduce_train_data.shape (42000, 700) knn 启动 knn 拟合
然后我看到以下错误:
Traceback (most recent call last):
File "E:\Coursera\KaggleDataProjects\DigitRecognition\main.py", line 74, in <module>
test_predict = neigh.predict(reduced_test_data)
File "C:\Python27\lib\site-packages\sklearn\neighbors\classification.py", line 146, in predict
neigh_dist, neigh_ind = self.kneighbors(X)
File "C:\Python27\lib\site-packages\sklearn\neighbors\base.py", line 313, in kneighbors
return_distance=return_distance)
File "binary_tree.pxi", line 1295, in sklearn.neighbors.ball_tree.BinaryTree.query (sklearn\neighbors\ball_tree.c:9889)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 91, in array2d
X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError
每次我通过稍微更改参数运行代码时,都不会发生此错误。有人可以解释一下需要做什么来解决这个问题。
训练数据的初始维度 (X) = 42000, 784 测试数据的初始维度 (test) = 28000, 784