python - 为什么输入掩码在 BERT 语言模型中都是相同的数字？

Question

对于文本分类任务，我应用了 Bert(fine tune)，得到的输出如下：为什么 input_mask 都是 1 ？

#to_feature_map is a function.
to_feature_map("hi how are you doing",0)

({'input_mask': <tf.Tensor: shape=(64,), dtype=int32, numpy=
  array([1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        dtype=int32)>,
  'input_type_ids': <tf.Tensor: shape=(64,), dtype=int32, numpy=
  array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        dtype=int32)>,
  'input_word_ids': <tf.Tensor: shape=(64,), dtype=int32, numpy=
  array([ 101, 7632, 2129, 2024, 2017, 2725,  102,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0], dtype=int32)>},
 <tf.Tensor: shape=(), dtype=int32, numpy=0>)```

score 2 · Accepted Answer

输入掩码——允许模型清楚地区分内容和填充。掩码与输入 ID 具有相同的形状，并且在输入 ID 未填充的任何位置都包含 1。

python - 为什么输入掩码在 BERT 语言模型中都是相同的数字？

1 回答 1

Related

Reference