python - Tensorflow：使用比 CPU 慢的 GPU 的自定义训练循环

Question

我正在 Colab 中对神经网络进行这种自定义训练，有无 GPU，使用 CPU 的训练过程更快，这让我认为我没有并行化操作或遗漏一些东西。我不认为是因为模型小，因为我尝试了更复杂的模型，问题仍然存在：

## Import libraries
import matplotlib
# matplotlib.use('TkAgg') # Required to make it run on both Windows and Mac
import matplotlib.pyplot as plt
import tensorflow as tf 
from tensorflow import keras
import numpy as np 
import os
from tqdm import trange

# Switch of unnecessary TF warning messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

###############################################################################
################################## Parameters #################################
###############################################################################

gamma = tf.constant(2.0)           # Curvature of the utility function
rho   = tf.constant(0.04)          # Discount rate
A      = tf.constant(0.5)          # TFP
alpha = tf.constant(0.36)          # Returns to scale
delta = tf.constant(0.05)          # Depreciation Rate of Capital

batchSize = 100                    # Batch Size
number_epochs = 100000             # Number of epochs

kMin = 0.1                         # lower bound of sample interval
kMax = 10.0                        # upper bound of sample interval

gridSize = 10000                  # Plotting grid

# Set global seed
tf.random.set_seed(1234)
np.random.seed(1234)

# Value function initial guess
initGuess = -60

# Neural network optimizer
optimizer = keras.optimizers.Adam()

###############################################################################
######################## Value Function Neural Network ########################
###############################################################################

def valueFnNeuralNet(nHidden = 3, nNeurons = 8):
    model = keras.models.Sequential()

    # Input layer
    model.add(keras.layers.Dense(nNeurons, activation = "tanh", input_dim = 1))

    # Hiden layers
    for layer in range(nHidden - 1):
        model.add(keras.layers.Dense(nNeurons, activation = "tanh"))

    # Output layer
    model.add(keras.layers.Dense(1,bias_initializer = keras.initializers.Constant(value = initGuess)))
    return model

def HJB(input, V):
    VPrime = tf.gradients(V(input), input)[0]
    VPrimemax = tf.maximum(VPrime, 1E-7)        # dV/dk

    Y = A * tf.pow(input, alpha)                # Output

    C = tf.pow(VPrimemax, (-1/gamma))           # Consumption

    I = Y - C                                   # Investment

    muK = I - delta * input                     # Capital drift

    U = tf.pow(C, 1-gamma) / (1-gamma)          # Utility

    HJB = U - rho * V(input) + tf.multiply(tf.stop_gradient(VPrimemax), muK)
    return HJB 

def Objective(batchSize):
    input = tf.random.uniform(shape = (batchSize,1), minval = kMin, maxval = kMax)
    error = HJB(input, VF)
    return tf.reduce_mean(tf.square(error))

###############################################################################
################################ Training Step ################################
###############################################################################

# Need decorator to run in graph mode instead of eager exectution
@tf.function
def training_step():
    with tf.GradientTape() as tape:
        loss = Objective(batchSize)
    grads = tape.gradient(loss, theta)
    optimizer.apply_gradients(zip(grads, theta))
    return loss

###############################################################################
################################ Training Loop ################################
###############################################################################

def train_model(epochs):
    losses = []
    for epoch in trange(epochs):
        loss = training_step()
        losses.append(loss.numpy())
    return losses 

###############################################################################
################################### Running ###################################
###############################################################################

# Set up neural network
VF = valueFnNeuralNet()

# Define trainable network parameters
theta = VF.trainable_variables

# Run Model (and output loss evolution) 
results = train_model(number_epochs)

我得到的输出如下：

没有 GPU： 100%|██████████| 100000/100000 [01:30<00:00, 1101.79it/s]

使用 GPU： 100%|██████████| 100000/100000 [03:36<00:00, 461.47it/s]

score 0 · Accepted Answer

GPU 对于大型矩阵乘法更有效。您的输入是有形(100, 1)的，因此 GPU 的分布式优势非常小，甚至无法抵消在 CPU 和 GPU 之间切换的开销。

我的猜测是，如果你输入了形状，你会看到模式反转(100, 100)。

python - Tensorflow：使用比 CPU 慢的 GPU 的自定义训练循环

1 回答 1

Related

Reference