python - TensorFlow Training model on image and text features, with multi class outputs

Question

I have a dataset that includes both images and text features. The labels for the training data is a 2 dimensional array, the same shape as the input images, of 1s/0s.

So basically, the training inputs are:

Input image with shape of (X,Y),
Additional feature set (i.e. text features) with shape (Z,).

And training labels have the shape of (X,Y).

I am trying to train a model using Tensorflow/Keras on this data. I know I can train a model where the input size is (X* Y) + Z, but I read that isn't the best way to handle mixing image/additional-data features.

So my questions are:

1) How would I set up my model to handle the mixed input types?

2) Since my output is the same size as my image, would I need to define a (X * Y) sized output layer? How would I specify the output layer so that it can take multiple values, that is, any/multiple location in the output can be 1 or 0?

score 2 · Accepted Answer

一种方法是定义两个独立的子模型来处理文本和图像数据，然后合并这些子模型的输出以创建最终模型：

---------------        ---------------
- Input Image -        - Input Text  -
---------------        ---------------
       |                       |
       |                       |
       |                       |
---------------        ---------------------  
- Image Model -        -     Text Model    -
- (e.g. CNNs) -        - (e.g. Embeddings, -
---------------        -  LSTM, Conv1D)    -
       \               ---------------------
        \                     /
         \                   /
          \                 /
           \               /
            \             /
             \           /
              \         /
               \       /
           ----------------------
           -      Merge         -
           - (e.g. concatenate) -
           ----------------------
                     |
                     |
                     |
           ----------------------
           -      Upsample      -
           - (e.g. Dense layer, -
           -   transpose-conv)  -
           ----------------------
                     |
                     |
                     |
                -----------
                -  Output -
                -----------

每个框对应一个或多个层，您可能有不同的方法来实现它们并设置它们的参数，尽管我在每个框中都提到了一些建议。

python - TensorFlow Training model on image and text features, with multi class outputs

1 回答 1

Related

Reference