1

假设我有一段:

text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]'''

如果说我输入了一个词(喜欢),那么我怎样才能删除该词所在的整个句子。我之前使用的方法很乏味;我会使用 sent_tokenize 来打破 para(超过 13000 个单词),因为我必须检查超过 1000 个单词,所以我会运行一个循环来检查每个句子中的每个单词。这需要很多时间,因为有 400 多个句子。

相反,我想检查段落中的那 1000 个单词,当找到该单词时,它会选择之前的所有单词直到句号,然后选择所有单词,直到句号。

4

3 回答 3

0

我不确定是否理解您的问题,但您可以执行以下操作:

text = 'whatever....'
sentences = text.split('.')
good_sentences = [e for e in sentences if 'my_word' not in e]

那是你要找的吗?

于 2013-09-25T10:55:50.517 回答
0

.这会删除在某处包含该单词的所有句子(由 a 包围的事物)。

def remove_sentence(input, word):
    return ".".join((sentence for sentence in input.split(".")
                    if word not in sentence))

>>>> remove_sentence(text, "published")
"[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"
>>>
>>> remove_sentence(text, "favoured")
"Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"
于 2013-09-25T10:56:39.520 回答
0

您可能有兴趣尝试类似于以下程序的内容:

import re

SENTENCES = ('This is a sentence.',
             'Hello, world!',
             'Where do you want to go today?',
             'The apple does not fall far from the tree.',
             'Sally sells sea shells by the sea shore.',
             'The Jungle Book has several stories in it.',
             'Have you ever been up to the moon?',
             'Thank you for helping with my problem!')

BAD_WORDS = frozenset(map(str.lower, ('to', 'sea')))

def main():
    for index, sentence in enumerate(SENTENCES):
        if frozenset(words(sentence.lower())) & BAD_WORDS:
            print('Delete:', repr(sentence))

words = lambda sentence: (m.group() for m in re.finditer('\w+', sentence))

if __name__ == '__main__':
    main()

原因

  1. 您从要过滤的句子和要查找的单词开始。
  2. 您将每个句子的单词集与您正在寻找的单词集进行比较。
  3. 如果存在交叉点,则您正在查看的句子是您将删除的句子。

输出

Delete: 'Where do you want to go today?'
Delete: 'Sally sells sea shells by the sea shore.'
Delete: 'Have you ever been up to the moon?'
于 2013-09-25T14:18:00.977 回答