您是否检查过Sequence 对象
。它允许创建自定义数据生成器。这个想法是继承 Sequence ,然后覆盖方法len,getitem。len应该返回序列中的批次数。返回源和目标对的逻辑写在getitem中。它应该返回一批数据。在多输入模型的情况下,您可以编写getitem,使其输出数据包含映射到模型输入层的字典(key=layername)。对于输出张量也是如此。更多信息可以在我上面添加的官方文档的链接中找到。最好的
编辑
根据我对您的问题的理解,这是要点:
class Dataset(Sequence):
def __init__(self, filenames, batchsize, shape):
self.filenames = filenames # List of filenames
self.batchsize = batchsize
self.shape = shape # Shape to which image should be
# resized
def __len__(self):
return len(self.filenames) // batchsize
def __getitem__(self, idx):
i = idx * self.batchsize
X_1 = np.zeros((self.batchsize, self.shape[0], self.shape[1], 3)
y = np.zeros((self.batchsize, --, --, ..., --)) # Depends on
# your target choice
filenames = self.filenames[i:i+self.batchsize]
for index, filename in enumerate(filenames):
image = cv2.imread(filename)
# Preprocess
image = your_preprocess(image)
X[index] = image
# You can include your pipeline for other
# Input also.
# Similarly obtain target values and load to y.
return {"layername": X_1, "layername": X_2}, {"layername": y}