python - 使用python进行文本搜索

Question

我正在从事一个文本搜索项目，并使用文本 blob 从文本中搜索句子。TextBlob 有效地提取所有带有关键字的句子。但是，为了进行有效的研究，我还想在之前和之后提取一个句子，我无法弄清楚。

下面是我正在使用的代码：

def extraxt_sents(Text,word):
    search_words = set(word.split(','))
        sents = ''.join([s.lower() for s in Text])
        blob = TextBlob(sents)
    matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
    print search_words
    print(matches)

score 1 · Accepted Answer

如果您想获取匹配前后的行，您可以创建一个循环并记住前一行，或者使用slices，就像列表中[from:to]一样。blob.sentences

最好的方法可能是使用enumeratebultin 函数。

match_region = [map(str, blob.sentences[i-1:i+2])     # from prev to after next
                for i, s in enumerate(blob.sentences) # i is index, e is element
                if search_words & set(s.words)]       # same as your condition

在这里，blob.sentences[i-1:i+2]将提取从 index i-1（包括）到 index i+2（不包括）的子列表，并将该列表中map的元素转换为字符串。

注意：实际上，您可能想要替换i-1为max(0, i-1); 否则i-1可能是-1，Python 会将其解释为最后一个元素，从而产生一个空切片。i+2另一方面，如果高于列表的长度，这将不是问题。

python - 使用python进行文本搜索

1 回答 1

Related

Reference