我正在做一个长文本分类任务,文档中有超过 10000 个单词,我打算使用 Bert 作为段落编码器,然后将段落的嵌入逐步提供给 BiLSTM。网络如下:
输入:(batch_size,max_paragraph_len,max_tokens_per_para,embedding_size)
伯特层:(max_paragraph_len,paragraph_embedding_size)
lstm层:???
输出层:(batch_size,classification_size)
如何用 keras 实现它?我正在使用 keras 的 load_trained_model_from_checkpoint 来加载 bert 模型
bert_model = load_trained_model_from_checkpoint(
config_path,
model_path,
training=False,
use_adapter=True,
trainable=['Encoder-{}-MultiHeadSelfAttention-Adapter'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-FeedForward-Adapter'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-MultiHeadSelfAttention-Norm'.format(i + 1) for i in range(layer_num)] +
['Encoder-{}-FeedForward-Norm'.format(i + 1) for i in range(layer_num)],
)