python - 搜索关键字作为名词

Question

que = ("What's the weather like?")
lines_list = tokenize.sent_tokenize(que)
    for text in lines_list:
        tokenizer = word_tokenize(text)
        nouns = nltk.pos_tag(tokenizer)
        chunked = ne_chunk(nouns)
        print(chunked) #(S What/WP 's/VBZ the/DT weather/NN like/IN ?/.)
if ("weather/NN") in chunked:
    print("I found weather as noun")

如果您运行此代码，它似乎无法识别“天气/NN”是分块的，我不明白为什么会发生这种情况。我的代码有问题吗？

谢谢你的帮助。

score 1 · Accepted Answer

问题是这chunked不是一个字符串，而是一个二元元组序列：

[('What', 'WP'), ("'s", 'VBZ'), ('the', 'DT'), ('weather', 'NN'), ('like', 'IN'), ('?', '.')]

所以，这个元组是你应该检查的：

if ("weather", "NN") in chunked:
    print("I found weather as noun")

更一般地，您可以通过查看实际值来调试它，而不是仅仅打印出它们的str表示。例如：

for chunk in chunked:
    print(type(chunk), chunk)

... 是您发现它是元组序列的方式，因为它显示：

<class 'tuple'> ('What', 'WP')

…而字符串会显示：

<class 'str'> W
<class 'str'> h
<class 'str'> a

…因为字符串是字符序列，而不是元组序列。

这些看起来像字符串元组。但是，如果您想确定地检查：

for chunk, typ in chunked:
    print(type(chunk), chunk, chunk(typ), typ)

如果它们是字符串，你会得到类似的东西：

<class 'str'> 'What' <class 'str'> 'WP'

…然后上面的代码就可以工作了。如果您看到类似以下内容：

<class 'nltk._spam.Eggs'> 'What' <class 'str'> 'WP'

......那么你可能不能只是这样做("weather", "NN"); 你必须看看如何创建一个Eggs对象。

score 0 · Accepted Answer

chunked 是一个块结构，它覆盖了__str__()or__repr__()方法，以便它打印为一个漂亮的字符串，但它本身不是一个字符串，所以你看不到in它是否是另一个字符串。试试if ("weather/NN") in str(chunked):。

python - 搜索关键字作为名词

2 回答 2

Related

Reference