machine-learning - 在基于转换的依赖解析中使用 LIBLINEAR

Question

我将使用 LIBLINEAR 为基于转换的依赖项解析做一些工作。但我很困惑如何使用它。如下：

我为基于转换的依赖解析的训练和测试过程设置了 3 个功能模板：

1. the word in the top of the stack
2. the word in the front of the queue
3. information from the current tree formed with the steps

LIBLINEAR 中定义的特征是：

FeatureNode(int index, double value)

一些例子如：

LABEL       ATTR1   ATTR2   ATTR3   ATTR4   ATTR5
-----       -----   -----   -----   -----   -----
1           0       0.1     0.2     0       0
2           0       0.1     0.3    -1.2     0
1           0.4     0       0       0       0
2           0       0.1     0       1.4     0.5
3          -0.1    -0.2     0.1     1.1     0.1

但我想在某个阶段定义我的特征，比如（一句话“我爱你”）：

feature template 1: the word is 'love' 
feature template 2: the word is 'you'
feature template 3: the information is - the left son of 'love' is 'I'

这是否意味着我必须使用 LIBLINEAR 定义特征，例如：-------FORMAT 1（词汇索引：0-I、1-love、2-you）

LABEL       ATTR1(template1)   ATTR2(template2)   ATTR3(template3)
-----       -----              -----              -----
SHIFT           1                 2                   0
(or LEFT-arc, 
 RIGHT-arc)

但是我已经想到了其他人的一些陈述，我似乎用二进制定义了特征，所以我必须定义一个词向量，例如：（'I'，'love'，'you'），例如，当'you'出现时，向量将是 (0, 0, 1)

所以LIBLINEAR中的特征可能是：--------FORMAT 2

LABEL       ATTR1('I')   ATTR2('love')   ATTR3('love')
-----       -----              -----              -----
SHIFT           0                 1                   0       ->denoting the feature template 1
(or LEFT-arc, 
 RIGHT-arc)
SHIFT           0                 0                   1       ->denoting the feature template 2
(or LEFT-arc, 
 RIGHT-arc)
SHIFT           1                 0                   0       ->denoting the feature template 3
(or LEFT-arc, 
 RIGHT-arc)

FORMAT 1 和 2 之间哪个正确？

有什么我弄错了吗？

score 1 · Accepted Answer

Basically you have a feature vector of the form:

LABEL RESULT_OF_FEATURE_TEMPLATE_1 RESULT_OF_FEATURE_TEMPLATE_2 RESULT_OF_FEATURE_TEMPLATE_3

Liblinear or LibSVM expect you to translate it into integer representation:

1 1:1 2:1 3:1

Nowadays, depending on the language you use there are lots of packages/libraries, which would translate the string vector into libsvm format automatically, without you having to know the details.

However, if for whatever reason you want to do it yourself, the easiest thing would be maintain two mappings: one mapping for labels ('shift' -> 1, 'left-arc' -> 2, 'right-arc' -> 3, 'reduce' -> 4). And one for your feature template result ('f1=I' -> 1, 'f2=love' -> 2, 'f3=you' -> 3). Basically every time your algorithms applies a feature template you check whether the result is already in the mapping and if not you add it with a new index.

Remember that Liblinear or Libsvm expect a sorted list in ascending order.

During processing you would first apply your feature templates to the current state of your stacks and then translate the strings to the libsvm/liblinear integer representation and sort the indexes in ascending order.

machine-learning - 在基于转换的依赖解析中使用 LIBLINEAR

1 回答 1

Related

Reference