python - Sklearn 数字数据集

Question

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn import svm

digits = datasets.load_digits()

print(digits.data)

classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.data[:-1], digits.target[:-1]

x = x.reshape(1,-1)
y = y.reshape(-1,1)
print((x))

classifier.fit(x, y)
###
print('Prediction:', classifier.predict(digits.data[-3]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

我也重新塑造了 x 和 y。我仍然收到一条错误消息：

发现样本数量不一致的输入变量：[1, 1796]

Y 有 1796 个元素的一维数组，而 x 有很多。它如何为 x 显示 1？

score 1 · Accepted Answer

实际上废弃了我在下面的建议：

此链接描述了通用数据集 API。该属性data是每个图像的二维数组，已经展平：

import sklearn.datasets
digits = sklearn.datasets.load_digits()
digits.data.shape
#: (1797, 64)

这就是您需要提供的全部内容，无需重塑。同样，属性data是每个标签的一维数组：

digits.data.shape
#: (1797,)

无需重塑。只需分成训练和测试并运行它。

尝试打印x.shape和y.shape. 我觉得你会找到类似的东西：(1, 1796, ...)和(1796, ...)分别。在 scikit 中调用fit分类器时，它需要两个形状相同的迭代器。

线索，为什么在重塑不同的方式时争论：

x = x.reshape(1, -1)
y = y.reshape(-1, 1)

也许尝试：

x = x.reshape(-1, 1)

与您的问题完全无关，但您预测digits.data[-3]训练集中唯一遗漏的元素是digits.data[-1]. 不确定这是否是故意的。

无论如何，最好使用 scikit 指标包检查您的分类器的更多结果。这个页面有一个在数字数据集上使用它的例子。

score 0 · Accepted Answer

重塑会将您的 8x8 矩阵转换为可用作特征的一维向量。您需要重塑整个 X 向量，而不仅仅是那些训练数据，因为您将用于预测的向量需要具有相同的格式。

以下代码显示了如何：

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn import svm

digits = datasets.load_digits()


classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.images, digits.target

#only reshape X since its a 8x8 matrix and needs to be flattened
n_samples = len(digits.images)
x = x.reshape((n_samples, -1))
print("before reshape:" + str(digits.images[0]))
print("After reshape" + str(x[0]))


classifier.fit(x[:-2], y[:-2])
###
print('Prediction:', classifier.predict(x[-2]))
###
plt.imshow(digits.images[-2], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

###
print('Prediction:', classifier.predict(x[-1]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

它将输出：

before reshape:[[  0.   0.   5.  13.   9.   1.   0.   0.]
 [  0.   0.  13.  15.  10.  15.   5.   0.]
 [  0.   3.  15.   2.   0.  11.   8.   0.]
 [  0.   4.  12.   0.   0.   8.   8.   0.]
 [  0.   5.   8.   0.   0.   9.   8.   0.]
 [  0.   4.  11.   0.   1.  12.   7.   0.]
 [  0.   2.  14.   5.  10.  12.   0.   0.]
 [  0.   0.   6.  13.  10.   0.   0.   0.]]
After reshape[  0.   0.   5.  13.   9.   1.   0.   0.   0.   0.  13.  15.  10.  15.   5.
   0.   0.   3.  15.   2.   0.  11.   8.   0.   0.   4.  12.   0.   0.   8.
   8.   0.   0.   5.   8.   0.   0.   9.   8.   0.   0.   4.  11.   0.   1.
  12.   7.   0.   0.   2.  14.   5.  10.  12.   0.   0.   0.   0.   6.  13.
  10.   0.   0.   0.]

以及对最后两张未用于训练的图像的正确预测 - 但是您可以决定在测试集和训练集之间进行更大的分割。

python - Sklearn 数字数据集

2 回答 2

Related

Reference