0

我正在尝试手动计算维特比的概率分布,然后在交叉验证中使用它们。

我有一个带有句子的语料库(corps),每个单词都与其标签结合起来,大致如下:

[('I', 'O'), ('go', 'B-Verb:go'), ('where', 'i'), ('want', 'B-Verb:want'), ('to', 'O'), ('go', 'B-Verb:go')]

然后我将我的数据分成训练和测试集并创建两个列表

train_set,test_set =train_test_split(corps,train_size=0.90,test_size=0.10,random_state = 101)
train_tagged_words = [ tup for sent in train_set for tup in sent ]
test_tagged_words = [ tup for sent in test_set for tup in sent ]

获取标签:

tags = {tag for word,tag in train_tagged_words}

但是当我想计算我的标签的转换矩阵时,代码需要永远。这是代码:

tags_matrix = np.zeros((len(tags), len(tags)), dtype='float32')
for i, t1 in enumerate(list(tags)):
    for j, t2 in enumerate(list(tags)): 
        tags_matrix[i, j] = t2_given_t1(t2, t1)[0]/t2_given_t1(t2, t1)[1]


print(tags_matrix)

难道我做错了什么?

4

0 回答 0