python - 如何在标签后获得合并的单词？

Question

我正在研究一个数据集，该数据集需要从数据框列的每个句子中提取形容词、动词和副词的所有单词。

这是我正在研究如何获得所需输出的示例。

list1=['good','excellent','was','not']
for i in list1:
  x=nltk.pos_tag([i])
  #print(x)
  if (x[0][1] == "JJ" or x[0][1] == "JJS" or x[0][1] == "RB" or x[0][1] == "VB" or x[0][1] == "RBR" or x[0][1] == "RBS" or x[0][1] == "VBN" or x[0][1] == "VBP"):
    print(x)

它给我的输出是：

[('good','JJ')]
[('not','RB')]

我需要得到的输出是这样的：

good not

有人可以帮忙吗？

score 0 · Accepted Answer

您必须更具体地了解您想要真正提取的内容：

但这是一个尝试。

您似乎正在尝试使用形容词/副词提取动词短语，如果是这样，您可以尝试：

from nltk import pos_tag, word_tokenize
from nltk import ngrams


text = "this is not good."
tagged_text = pos_tag(word_tokenize(text))


focus_tags = set(['JJ', 'JJS', 'RB', 'RBR', 'RBS', 'VB', 'VBN', 'VBP'])



for (token1, tag1), (token2, tag2) in ngrams(tagged_text, 2):
    if tag1 in focus_tags and tag2 in focus_tags:
        print(token1 + ' ' + token2)

但输出：`is not`和`is not good`！

嗯，在这种情况下，你想精确not good还是is not good？

如果是is not good三元组，请尝试：

for (token1, tag1), (token2, tag2), (token3, tag3) in ngrams(tagged_text, 3):
    if tag1 in focus_tags and tag2 in focus_tags and tag3 in focus_tags:
        print(token1 + ' ' + token2 + ' ' + token3)

如果我只是想要`not good`怎么办？

也许尝试删除动词？例如

from nltk import pos_tag, word_tokenize
from nltk import ngrams


text = "this is not good."
tagged_text = pos_tag(word_tokenize(text))


focus_tags = set(['JJ', 'JJS', 'RB', 'RBR', 'RBS'])



for (token1, tag1), (token2, tag2) in ngrams(tagged_text, 2):
    if tag1 in focus_tags and tag2 in focus_tags:
        print(token1 + ' ' + token2)

python - 如何在标签后获得合并的单词？

1 回答 1

但输出：is not和is not good！

如果我只是想要not good怎么办？

Related

Reference

但输出：`is not`和`is not good`！

如果我只是想要`not good`怎么办？