0

我正在使用 nltk 研究 NLP。我正在使用分块来提取人名。分块后,我想用特定的字符串“男性”或“女性”替换这些块。

我的代码是:

import nltk

with open('male_names.txt') as f1:
    male = [line.rstrip('\n') for line in f1]
with open('female_names.txt') as f2:
     female = [line.rstrip('\n') for line in f2]

with open("input.txt") as f:
    text = f.read()

words = nltk.word_tokenize(text)
tagged = nltk.pos_tag(words)
chunkregex = r"""Name: {<NNP>+}"""
chunkParser = nltk.RegexpParser(chunkregex)
chunked = chunkParser.parse(tagged)

for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
    chunk=[]
    for word, pos in subtree:
        chunk.append(word)
        temp = " ".join(chunk)
    **if temp in male:
        subtree = ('Male', pos)
    if temp in female:
        subtree = ('Female', pos)**
    print subtree

print chunked

我的输入数据是:

杰克·斯派洛船长抵达牙买加的皇家港征用一艘船。尽管拯救了韦瑟比·斯旺州长的女儿伊丽莎白·斯旺溺水身亡,但他还是因海盗罪被判入狱。

当前输出为:

(S (Name Captain/NNP Jack/NNP Sparrow/NNP) 到达/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. 尽管/IN 救援/VBG (Name Elizabeth/NNP Swann/NNP) ,/, /IN,/的/DT女儿/NN (Name Governor/NNP Weatherby/NNP Swann/NNP) ,/,来自/IN溺水/VBG,/,他/PRP是/VBZ入狱/VBN为/IN盗版/NN./.)

我想用 'Male' 或 'Female' 替换这些块,这应该输出为:

(S Male/NNP 到达/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. 尽管/IN 救援/VBG Female/NNP ,/, /IN,/的/DT女儿/NN Male/NNP ,/,来自/IN溺水/VBG,/,他/PRP是/VBZ入狱/VBN为/IN盗版/NN./.)

代码中的粗体部分没有做它应该做的事情。该print subtree语句显示了更改,但print chunked没有更改。

我做错了什么还是有其他方法?
我是 python 和 nltk 的新手。任何帮助表示赞赏。

malefemale包含名称列表:

[“杰克斯派洛船长”、“韦瑟比斯旺州长”、“罗宾”]

[“伊丽莎白斯旺”,“珍妮”]

4

1 回答 1

3

我不知道我是否正确理解了您的问题。NLTK 子树只是普通的 Python 列表。所以你也可以在这里执行正常的列表操作。试试这个代码片段而不是你代码中的循环部分。

for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
    full_name = []
    for word, pos in subtree:
        full_name.append(word)
        st = " ".join(full_name)  # iterate till the variable catches full name as tokenizer segments words.
        if st in male:
            subtree[:] = [("Male",pos)]  # replacing the subtree with our own value
        elif st in female:
            subtree[:] = [("Female",pos)]

输出:

> (S (Name male/NNP) arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG (Name female/NNP) ,/, the/DT daughter/NN of/IN (Name male/NNP) ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VB for/IN piracy/NN./.)
于 2018-06-20T09:49:19.843 回答