python - 150x150 图像上的基本 softmax 模型实现

Question

我正在学习 tensorflow，我尝试将基本的 softmax MNIST 示例改编为在我自己的图像集上工作。这是建筑物的航拍照片，我想按屋顶类型对它们进行分类。可以进行 4 种这样的分类。

简单（也许是幼稚的）想法是调整图像的大小（因为它们并不完全相同）并将它们展平。然后更改代码中的张量形状并运行它。当然，它不起作用。首先让我向您展示代码。

# Load csv Data
filenames = []
_answers = []
with open('/home/david/DSG/id_train.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',')
    for row in csv_reader:
        one_hot_vec = [0, 0, 0, 0]
        one_hot_vec[int(row[1])-1] = 1
        _answers.append(np.asarray(one_hot_vec))
        filenames.append("/home/david/DSG/roof_images/" + str(row[0]) + ".jpg")


sess = tf.InteractiveSession()

# Image Loading and processing
filename_q = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
key, value = reader.read(filename_q)
__img = tf.image.decode_jpeg(value, channels=1)
_img = tf.expand_dims(tf.image.convert_image_dtype(__img, tf.float32),0)
img = tf.image.resize_nearest_neighbor(_img, [150,150])

# Actual model
x = tf.placeholder(tf.float32, [None, 22500])
W = tf.Variable(tf.zeros([22500, 4]))
b = tf.Variable(tf.zeros([4]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training algorithm
y_ = tf.placeholder(tf.float32, [None, 4])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y,1e-10,1.0)), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Evaluate model, this checks the results from the y (prediciton matrix) against the known answers (y_)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

coord = tf.train.Coordinator()
init_op = tf.initialize_all_variables()
sess.run(init_op)

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# Loads and process all the images, adding them to an array for later use
images = []
for i in range(8000):
    if i % 100 == 0:
        print("Processing Images " + str(100*(i+100)/8000) + "% complete")
    image = img.eval().flatten()
    images.append(image)

# Train our model
for i in range(80):
    print("Training the Model " + str(100*(i+1)/80) + "% complete")
    batchImages = images[i*100:((i+1)*100)]
    batchAnswers = answers[i*100:((i+1)*100)].astype(float)
    # Here's a debug line I put in to see what the numbers were
    print(sess.run(y, feed_dict={x: batchImages, y_: batchAnswers}))
    sess.run(train_step, feed_dict={x: batchImages, y_: batchAnswers})

coord.request_stop()
coord.join(threads)

可以看出，我正在打印来自 softmax 的 y 值。结果是完全看起来像这样的张量[0., 0., 0., 1.]。我觉得这很奇怪。所以我打印了tf.matmul(x, W) + b.

结果是这样的：

[[-236.86216736 -272.89904785   59.67744446  450.08377075]
 [-327.19482422 -384.06918335   87.47353363  623.79052734]
 [-230.79460144 -264.78787231   60.29759598  435.28485107]
 [-188.10324097 -212.30155945   53.8230629   346.58175659]
 [-180.26617432 -209.45767212   48.90292358  340.82092285]
 [-177.13232422 -200.59474182   45.97179413  331.75531006]
 [-225.94104004 -258.97390747   61.54353333  423.37136841]
 [-259.33599854 -290.73773193   67.69062042  482.38308716]
 [-151.53468323 -174.09906006   39.97481537  285.65893555]
 [-237.23356628 -272.71789551   65.12500763  444.82647705]
 ..... you get the idea
 [-195.14971924 -221.30851746   53.09790802  363.36032104]
 [-157.30508423 -175.47320557   40.4044342   292.37384033]
 [-178.94332886 -203.36262512   47.0838356   335.22219849]
 [-180.61688232 -200.0609436    45.12242508  335.55541992]
 [-145.7559967  -163.06838989   35.25980377  273.56466675]
 [-194.07254028 -213.78709412   53.14990997  354.70977783]
 [-191.92044067 -219.13395691   49.84062958  361.21377563]]

对于手动计算 softmax 的第一个第二个和第三个元素，您会得到 E-200 数量级的数字，基本上为零。然后是第四个元素的大于 1 的数字。由于所有人都遵循这种模式，显然有些事情是错误的。

现在我检查了输入，我的答案是像这样的一个热向量[0, 1, 0, 0]，我的图像被展平，值标准化为 0 和 1（浮点数）。就像 MNIST 的例子一样。

我还注意到，在 MNIST 示例中，matmul 的值要小得多。E0 量级。这是因为每张图像上有 784 个元素，而不是 22500 个？这是问题的原因吗？

哎呀，也许由于某种原因这永远不会奏效。我需要一些帮助。

编辑：我决定检查图像大小是否有任何影响，果然 matmul 确实给出了较小的数字。但是它们仍然表现出一种模式，所以我再次通过 softmax 运行它并得到了这个输出：

[[  2.12474524e-20   1.00000000e+00   1.10456488e-18   0.00000000e+00]
 [  3.22400550e-21   1.00000000e+00   1.24568592e-19   0.00000000e+00]
 [  2.49283055e-28   1.00000000e+00   6.52334536e-26   0.00000000e+00]
 [  4.73190862e-23   1.00000000e+00   3.71980738e-21   0.00000000e+00]
 [  1.11151765e-26   1.00000000e+00   4.14652626e-24   0.00000000e+00]
 [  2.23096276e-22   1.00000000e+00   7.21511359e-21   0.00000000e+00]
 [  1.41888640e-23   1.00000000e+00   2.13637447e-21   0.00000000e+00]
 [  3.55662848e-17   1.00000000e+00   5.14018079e-16   4.06785808e-33]
 [  8.25783417e-26   1.00000000e+00   2.95267040e-23   0.00000000e+00]
 [  1.09395607e-25   1.00000000e+00   3.76775998e-23   0.00000000e+00]
 [  9.34879669e-13   1.00000000e+00   1.07488766e-11   7.21446627e-25]
 [  3.09687017e-34   1.00000000e+00   5.22547065e-31   0.00000000e+00]
 [  2.10362117e-22   1.00000000e+00   1.31067148e-20   0.00000000e+00]
 [  5.86830220e-23   1.00000000e+00   9.55902033e-21   0.00000000e+00]
 [  9.59656235e-17   1.00000000e+00   2.98987045e-15   7.10348533e-32]
 [  2.33712669e-16   1.00000000e+00   3.26934410e-15   1.55066807e-31]
 [  1.09302052e-27   1.00000000e+00   5.34793657e-25   0.00000000e+00]
 [  1.67101349e-25   1.00000000e+00   1.15098012e-22   0.00000000e+00]
 [  4.46111042e-26   1.00000000e+00   1.23599421e-23   0.00000000e+00]
 [  1.31791856e-24   1.00000000e+00   2.25831162e-22   0.00000000e+00]
 [  2.19408324e-12   1.00000000e+00   5.67631081e-11   1.22608556e-23]]

那肯定是有别的问题了。

score 1 · Accepted Answer

您的数据集可能不平衡，这将使网络更难训练，因为它倾向于预测最可能的类别。

我认为您的单层模型不足以在整个数据集上进行训练。您可能应该添加更多层并使用卷积以及最大池化。

但是，如果您想验证该模型是否可以工作，请尝试在更少的图像（例如：50 张图像）上对其进行训练，看看它是否可以过拟合这个小训练集。

python - 150x150 图像上的基本 softmax 模型实现

1 回答 1

Related

Reference