我在 scikit learn 中使用标签传播进行半监督分类。我有 7 个维度的 17,000 个数据点。我无法在这个数据集上使用它。它抛出了一个 numpy 大数组错误。但是,当我处理相对较小的数据集(例如 200 点)时,它工作得很好。任何人都可以建议修复吗?
label_prop_model.fit(np.array(data), labels)
File "/usr/lib/pymodules/python2.7/sklearn/semi_supervised/mylabelprop.py", line 58, in fit
graph_matrix = self._build_graph()
File "/usr/lib/pymodules/python2.7/sklearn/semi_supervised/mylabelprop.py", line 108, in _build_graph
affinity_matrix = self._get_kernel(self.X_) # get the affinty martix from the data using rbf kernel
File "/usr/lib/pymodules/python2.7/sklearn/semi_supervised/mylabelprop.py", line 26, in _get_kernel
return rbf_kernel(X, X, gamma=self.gamma)
File "/usr/lib/pymodules/python2.7/sklearn/metrics/pairwise.py", line 350, in rbf_kernel
K = euclidean_distances(X, Y, squared=True)
File "/usr/lib/pymodules/python2.7/sklearn/metrics/pairwise.py", line 173, in euclidean_distances
distances = safe_sparse_dot(X, Y.T, dense_output=True)
File "/usr/lib/pymodules/python2.7/sklearn/utils/extmath.py", line 79, in safe_sparse_dot
return np.dot(a, b)
ValueError: array is too big.