我一直在研究我的项目深度学习语言检测,这是一个具有这些层的网络,可以从 16 种编程语言中识别:
这是生成网络的代码:
# Setting up the model
graph_in = Input(shape=(sequence_length, number_of_quantised_characters))
convs = []
for i in range(0, len(filter_sizes)):
conv = Conv1D(filters=num_filters,
kernel_size=filter_sizes[i],
padding='valid',
activation='relu',
strides=1)(graph_in)
pool = MaxPooling1D(pool_size=pooling_sizes[i])(conv)
flatten = Flatten()(pool)
convs.append(flatten)
if len(filter_sizes)>1:
out = Concatenate()(convs)
else:
out = convs[0]
graph = Model(inputs=graph_in, outputs=out)
# main sequential model
model = Sequential()
model.add(Dropout(dropout_prob[0], input_shape=(sequence_length, number_of_quantised_characters)))
model.add(graph)
model.add(Dense(hidden_dims))
model.add(Dropout(dropout_prob[1]))
model.add(Dense(number_of_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
所以我的最后一门语言课是 SQL,在测试阶段,它永远无法正确预测 SQL,并且得分为 0%。我认为这是由于 SQL 样本的质量很差(实际上它们很差),所以我删除了这门课,开始训练 15 个课。令我惊讶的是,现在 F# 文件的检测率为 0%,而 F# 是删除 SQL 后的最后一个类(即,最后一个位置为 1 其余位置为 0 的 one-hot-vector)。现在,如果一个在 16 上训练的网络用于对抗 15,它将达到 98.5% 的非常高的成功率。
我使用的代码非常简单,主要在defs.py和data_helper.py中可用
这是使用 16 个类对 16 个类进行测试的网络训练结果:
Final result: 14827/16016 (0.925761738262)
xml: 995/1001 (0.994005994006)
fsharp: 974/1001 (0.973026973027)
clojure: 993/1001 (0.992007992008)
java: 996/1001 (0.995004995005)
scala: 990/1001 (0.989010989011)
python: 983/1001 (0.982017982018)
sql: 0/1001 (0.0)
js: 991/1001 (0.99000999001)
cpp: 988/1001 (0.987012987013)
css: 987/1001 (0.986013986014)
csharp: 994/1001 (0.993006993007)
go: 989/1001 (0.988011988012)
php: 998/1001 (0.997002997003)
ruby: 995/1001 (0.994005994006)
powershell: 992/1001 (0.991008991009)
bash: 962/1001 (0.961038961039)
这是同一个网络(针对 16 个训练)针对 15 个类运行的结果:
Final result: 14827/15015 (0.987479187479)
xml: 995/1001 (0.994005994006)
fsharp: 974/1001 (0.973026973027)
clojure: 993/1001 (0.992007992008)
java: 996/1001 (0.995004995005)
scala: 990/1001 (0.989010989011)
python: 983/1001 (0.982017982018)
js: 991/1001 (0.99000999001)
cpp: 988/1001 (0.987012987013)
css: 987/1001 (0.986013986014)
csharp: 994/1001 (0.993006993007)
go: 989/1001 (0.988011988012)
php: 998/1001 (0.997002997003)
ruby: 995/1001 (0.994005994006)
powershell: 992/1001 (0.991008991009)
bash: 962/1001 (0.961038961039)
有没有其他人看过这个?我怎样才能绕过它?