我有 8 个用于 maxent 分类器的特征,并且想知道每个权重,因为我需要了解每个特征的重要性。
for i in range(len(list)):
features = {}
features['a'] = 0
features['b'] = 0
features['c'] = 0
features['d'] = 0
features['e'] = 0
features['f'] = 0
features['g'] = 0
features['h'] = 0
for j in range(len(list[i])):
first, second = list[i][j].split('+')
first_lexical, first_morph = first.split('/')
second_lexical, second_morph = second.split('/')
if first_lexical == second_lexical:
features['a'] += 1
if first_morph == second_morph:
features['b'] += 1
if "JC" in first_morph:
features['d'] += 1
elif first_lexical == second_lexical:
if "EF" in first_morph:
features['d'] += 1
elif "EP" in first_morph:
features['e'] += 1
elif "XS" in first_morph:
features['f'] += 1
elif "JX" in first_morph:
features['g'] += 1
elif "JC" in first_morph:
features['h'] += 1
我使用最大熵是因为计算两个句子之间的结构相似性。所以我使用特征作为相同语素的计数。这就是特征值不是 0 或 1 的原因。
当我运行此代码时:
print(classifier.weights())
它打印 64 个列表元素。我认为它只显示打印 8 个元素(重量),但它返回如下:
[ 1.74089048 2.66009496 1.42702806 0.14474766 0.14210167 0.15642977
0.07329622 0.19233666 0.30679333 1.05599702 1.60007152 -0.17416653
0.09417338 0.16386887 0.27088739 -0.72500181 -8.48476894 0.2924295
0.29734346 0.28692798 1.24685007 1.13583538 0.34032173 0.97472507
1.21521307 1.31532032 1.57745202 0.5204001 0.76549421 1.79209505
0.44465357 0.73647553 -1.08840863 7.89243891 1.08035386 10.01641604
1.12682947 0.37774782 0.85929749 0.16311825 0.45568935 -0.04190585
-0.06698004 -0.08507122 -0.02308924 -0.10700906 0.10775206 0.66603408
-0.39178407 0.13196092 0.09278365 0.36485199 0.64181725 -3.63790857
2.32751187 -0.87754617 0.63697054 -3.16749379 -8.87589551 0.1192744
-2.68618694 -3.6713022 -3.79744038 -1.1949963 ]
我想知道每个元素的含义以及如何获得每个元素的权重。