我正在尝试从此github Repo解析 .ConLL 文件,这是我的解析代码示例:
from io import open
from conllu import parse_tree_incr
import glob
import os
for filename in glob.glob('./licenses-conll-format/22-MIT/MIT_permissionCopy.conll'):
data_file=open(filename, "r", encoding="utf-8")
for tokentree in parse_incr(data_file):
print(tokentree.serialize())
输出 :
24 Permission _ NN NN _ 27 nsubjpass _ _
25 is _ VBZ VBZ _ 27 auxpass _ _
26 hereby _ RB RB _ 27 advmod _ _
27 granted _ VBN VBN _ 11 rcmod _ _
28 , _ , , _ 27 punct _ _
29 free _ JJ JJ _ 27 advmod _ _
30 of _ IN IN _ 0 erased _ _
31 charge _ NN NN _ 29 prep_of _ _
这似乎缺少原始 .conll 文件中的一些注释(I-PERMISSION、B-PERMISSION 等 ..):
24 Permission _ NN NN _ 27 nsubjpass _ _ B-PERMISSION COPY
25 is _ VBZ VBZ _ 27 auxpass _ _ I-PERMISSION
26 hereby _ RB RB _ 27 advmod _ _ I-PERMISSION
27 granted _ VBN VBN _ 11 rcmod _ _ I-PERMISSION
28 , _ , , _ 27 punct _ _ O
29 free _ JJ JJ _ 27 advmod _ _ I-PERMISSION
30 of _ IN IN _ 0 erased _ _ I-PERMISSION
31 charge _ NN NN _ 29 prep_of _ _ I-PERMISSION
32 , _ , , _ 27 punct _ _ O
关于如何获取所有注释的任何想法?