0

I am trying to work with nolearn and use the ConcatLayer to combine multiple inputs. It works great as long as every input has the same type and shape. I have three different types of inputs that will eventually produce a single scalar output value.

  • The first input is an image of dimensions (288,1001)

  • The second input is a vector of length 87

  • The third is a single scalar value

I am using Conv2DLayer(s) on the first input. The second input utilizes Conv1DLayer or DenseLayer (not sure which would be better since I can't get it far enough to see what happens) I'm not even sure how the third input should be set up since it is only a single value I want to feed into the network.

The code blows up at the ConcatLayer with: 'Mismatch: input shapes must be the same except in the concatenation axis'

It would be forever grateful if someone could write out a super simple network structure that can take these types of inputs and output a single scalar value. I have been googling all day and simply cannot figure this one out.

The fit function looks like this if it is helpful to know, as you can see I am inputting a dictionary with an item for each type of input:

X = {'base_input': X_base, 'header_input': X_headers, 'time_input':X_time}
net.fit(X, y)
4

1 回答 1

1

很难正确回答这个问题,因为 - 这取决于。在没有关于您正在尝试做什么以及您正在处理的数据的信息的情况下,我们在这里玩猜谜游戏,因此我不得不回退到提供一般提示。

首先,ConcatLayer 抱怨是完全合理的。将标量附加到图像的像素值并没有多大意义。所以你应该想想你真正想要什么。这很可能是结合了三个来源的信息。

您建议使用 2D 卷积处理图像和使用 1D 卷积处理序列数据是正确的。如果您想生成一个标量值,您可能希望稍后使用密集层来压缩信息。所以很自然地,让三个分支的低级处理独立,然后将它们连接起来。

类似于以下内容:

Image -> conv -> ... -> conv -> dense -> ... -> dense -> imValues
Timeseries -> conv -> ... -> conv -> dense ... -> dense -> seriesValues
concatLayer([imValues, seriesValues, Scalar] -> dense -> ... -> dense with num_units=1

另一个不太合理的选项是在图像的低级处理中添加信息。如果本地处理更容易,考虑到标量/时间序列的知识,这可能是有道理的。

这种架构可能看起来像:

concatLayer(seriesValues, scalar) -> dense -> ... -> reshape((-1, N, 1, 1))
    -> Upscale2DLayer(Image.shape[2:3]) -> globalInformation
concatLayer([globalInformation, Image]) -> 2D conv filtersize=1 -> conv -> ... -> conv

请注意,您几乎肯定会选择第一个选项。

我注意到的一件不相关的事情是您的输入图像的巨大尺寸。你应该减少它(调整大小/补丁)。除非您拥有大量数据以及大量内存和计算能力,否则您将过度拟合或浪费硬件。

于 2016-08-04T12:13:07.433 回答