我有这个链接中讨论的相同问题Python extract sentence contains word,但不同之处在于我想在同一个句子中找到 2 个单词。我需要从包含 2 个特定单词的语料库中提取句子。有人可以帮助我吗?
问问题
5896 次
3 回答
2
将TextBlob包与 Python 的内置sets一起使用会很简单。
基本上,遍历文本中的句子,并检查句子中的单词集与搜索词之间是否存在交集。
from text.blob import TextBlob
search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = []
for sentence in blob.sentences:
words = set(sentence.words)
if search_words & words: # intersection
matches.append(str(sentence))
print(matches)
# ["Let's go buy some apples."]
更新:或者,更 Python 地,
from text.blob import TextBlob
search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print(matches)
# ["Let's go buy some apples."]
于 2013-08-30T20:56:25.003 回答
2
如果这是你的意思:
import re
txt="I like to eat apple. Me too. Let's go buy some apples."
define_words = 'some apple'
print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,txt)
Output: [" Let's go buy some apples."]
您也可以尝试:
define_words = raw_input("Enter string: ")
检查句子是否包含定义的单词:
import re
txt="I like to eat apple. Me too. Let's go buy some apples."
words = 'go apples'.split(' ')
sentences = re.findall(r"([^.]*\.)" ,txt)
for sentence in sentences:
if all(word in sentence for word in words):
print sentence
于 2013-08-30T09:17:00.543 回答
1
我想你想要一个使用 nltk 的答案。而且我猜这两个词不需要是连续的吧?
>>> from nltk.tokenize import sent_tokenize, word_tokenize
>>> text = 'I like to eat apple. Me too. Let's go buy some apples.'
>>> words = ['like', 'apple']
>>> sentences = sent_tokenize(text)
>>> for sentence in sentences:
... if (all(map(lambda word: word in sentence, words))):
... print sentence
...
I like to eat apple.
于 2013-08-30T10:13:47.970 回答