我正在解决句子级二进制分类任务。我的数据由 3 个标记子数组组成:左上下文、核心和右上下文。
我使用Keras设计了卷积神经网络的几种替代方案,并验证哪一种最适合我的问题。
我是 Python 和 Keras 的新手,我决定从更简单的解决方案开始,以测试哪些更改可以改善我的指标(准确度、精确度、召回率、f1 和 auc-roc)。第一个简化是关于输入数据:我决定忽略上下文来创建 Keras 的顺序模型:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 500) 0
_________________________________________________________________
masking_1 (Masking) (None, 500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 500, 100) 64025600
_________________________________________________________________
conv1d_1 (Conv1D) (None, 497, 128) 51328
_________________________________________________________________
average_pooling1d_1 (Average (None, 62, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 62, 128) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 61, 256) 65792
_________________________________________________________________
dropout_2 (Dropout) (None, 61, 256) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 54, 32) 65568
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 32) 0
_________________________________________________________________
dense_1 (Dense) (None, 16) 528
_________________________________________________________________
dropout_3 (Dropout) (None, 16) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 34
=================================================================
如您所见,我使用固定大小的输入,因此我应用了填充预处理。我还使用了带有 Word2Vec 模型的嵌入层。
该模型返回以下结果:
P 0.875457875
R 0.878676471
F1 0.87706422
AUC-ROC 0.906102654
我希望实现如何通过 Lambda 层在我的 CNN 中选择输入数据的子数组。我使用以下 Lambda 层定义:
Lambda(lambda x: x[:, 1], output_shape=(500,))(input)
这是我的新 CNN 的摘要(你可以看到它与以前的几乎相同):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 500) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 500) 0
_________________________________________________________________
masking_1 (Masking) (None, 500) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 500, 100) 64025600
_________________________________________________________________
conv1d_1 (Conv1D) (None, 497, 128) 51328
_________________________________________________________________
average_pooling1d_1 (Average (None, 62, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 62, 128) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 61, 256) 65792
_________________________________________________________________
dropout_2 (Dropout) (None, 61, 256) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 54, 32) 65568
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 32) 0
_________________________________________________________________
dense_1 (Dense) (None, 16) 528
_________________________________________________________________
dropout_3 (Dropout) (None, 16) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 34
=================================================================
但结果令人作呕,因为准确率停止在60%,而且很明显,就第一个模型结果而言,准确率、召回率和 f1 太低(< 0.10)。
我不知道发生了什么,我不知道这些网络是否比我想象的更加不同。
关于这个问题的任何线索?