tensorflow - AssertionError：使用 predict 函数时，batch_size 必须能被正在使用的 TPU 内核数（1 vs 8）整除

Question

上下文的一些细节：

使用 TPU 在 Google Colab 上工作。
模型拟合成功，没有任何问题
尝试使用预测功能时遇到问题

这是我用来训练的代码：

tpu_model.fit(x, y,
          batch_size=128,
          epochs=60)

这是我用来预测的代码：

def generate_output():
    generated = ''
    #sentence = text[start_index: start_index + Tx]
    #sentence = '0'*Tx
    usr_input = input("Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: ")
    # zero pad the sentence to Tx characters.
    sentence = ('{0:0>' + str(maxlen) + '}').format(usr_input).lower()
    generated += usr_input 

    sys.stdout.write("\n\nHere is your poem: \n\n") 
    sys.stdout.write(usr_input)
    for i in range(400):

        x_pred = np.zeros((1, maxlen, len(chars)))

        for t, char in enumerate(sentence):
            if char != '0':
                x_pred[0, t, char_indices[char]] = 1.

        --> preds = tpu_model.predict(x_pred, batch_size = 128 ,workers = 8,verbose=0)[0]
        next_index = sample(preds, temperature = 1.0)
        next_char = indices_char[next_index]

        generated += next_char
        sentence = sentence[1:] + next_char

        sys.stdout.write(next_char)
        sys.stdout.flush()

        if next_char == '\n':
            continue

这是错误（在上方添加了一个箭头，以便您知道错误的位置：

AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)

这对我来说毫无意义，因为我在训练时使用的批量大小可以被 8 整除，并且我在预测函数中传递的批量大小可以被 8 整除。

我不确定问题是什么以及如何解决它。任何帮助将非常感激。

score 0 · Accepted Answer

从错误：

AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)

看起来您使用的 batch_size 为 1，可以从输入数据的第一个维度推断出：

x_pred = np.zeros((1, maxlen, len(chars)))

我认为您可能希望将其更改为：

x_pred = np.zeros((8, maxlen, len(chars)))

使批处理维度变为 8，与使用的 TPU 核心数相匹配。

或者您也可以保持当前的 batch_size 为 1，但使用 1 个 TPU 内核。

tensorflow - AssertionError：使用 predict 函数时，batch_size 必须能被正在使用的 TPU 内核数（1 vs 8）整除

1 回答 1

Related

Reference