0

我有一个大小为 50000 的列表。说a。每个元素都是一个元组,比如说b=a[0]。每个元组由 2 个列表组成,例如c=b[0], d=b[1]。第一个列表 iec的长度为 784,第二个 ied的长度为 10。从这个列表中,我需要提取以下内容:
对 list 的前 10 个元素进行分组a。从这 10 个元组中,提取它们的第一个元素 ( c) 并将它们放入一个大小为 的矩阵中784x10。还提取元组的第二个元素并将它们放入另一个 size 矩阵中10x10。对 list 中的每批 10 个元素重复此操作a
这可以使用列表理解在一行中完成吗?还是我必须编写多个 for 循环?哪种方法有效且最好?注意:如果我得到一个列表或 numpy.ndarray 矩阵,那也没关系。

附加信息:我正在关注这个关于神经网络的教程,该教程旨在设计一个神经网络来识别手写数字。MNIST 数据库用于训练网络。训练数据采用上述格式。我需要为每个 mini_batch 创建一个 input_images 和 expected_output 矩阵。

这是我尝试过的代码。我得到一个大小为 50000 的列表。它没有分成 mini_batches

f = gzip.open('mnist.pkl.gz', 'rb')
tr_d, va_d, te_d = pickle.load(f, encoding='latin1')
f.close()
training_inputs = [numpy.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = zip(training_inputs, training_results)

# training_data is a list of size 50000 as described above
n = len(training_data)  # n=50000
mini_batch_size = 10
mini_batch = [x[0] for k in range(0, n, mini_batch_size) for x in training_data[k:k+mini_batch_size]]

在此处mnist.pkl.gz获得

4

1 回答 1

1

我在您添加源代码之前写了我的答案,因此它纯粹基于您用文字写出来的第一部分。因此,就输入大小的变化而言,它不是非常安全的。如果您进一步阅读本书,Anders Nielsen 实际上提供了他自己的实现。

My main answer is not a single line answer, as that would obfuscate what it does and I would advise you very much to write complex processes like this out so you have a better understanding of what actually happens. In my code I make a firstMatrix, which contains the c-elements in a matrix, and a secondMatrix, which contains the d-elements. I do this for every batch of 10, but didn't know what you want to do with the matrices afterwards so I just make them for every batch. If you want to group them or something, please say so and I will try to implement it.

for batch in np.array_split(a,10):
    firstMatrix = np.zeros(shape=(784,10))
    secondMatrix = np.zeros(shape=(10,10))
    for i in range(len(batch)):
        firstMatrix[:,i] = batch[i][0]
        secondMatrix[:,i] = batch[i][1]

If you really want a one-liner, here is one that makes an array of the firstMatrices and one for the secondMatrices:

firstMatrices = [np.array([batch[i][0] for i in range(len(batch))]).T for batch in np.array_split(a,10)]
secondMatrices = [np.array([batch[i][1] for i in range(len(batch))]).T for batch in np.array_split(a,10)]
于 2018-12-29T19:53:20.157 回答