0

我有一个程序可以围绕特定关键字提取文本。我正在尝试对其进行修改,以便如果两个关键字足够接近,它只会显示一个较长的文本片段而不是两个单独的片段。

我当前的代码如下,将关键字后的单词添加到列表中,如果找到另一个关键字,则重置计数器。但是,我发现了两个问题。首先是我的 spyder 笔记本中的数据速率限制超出了,我一直无法处理。第二个是虽然这会产生更长的片段,但它不会消除重复。

有谁知道摆脱重复片段的方法,或者知道如何以不超过数据速率限制的方式合并片段(或知道如何更改 spyder 速率限制)?谢谢!!

def occurs(word1, word2, file, filewrite):
    import os


    infile = open(file,'r')     #opens file, reads, splits into lines
    lines = infile.read().splitlines()
    infile.close()
    wordlist = [word1, word2]       #this list allows for multiple words
    wordsString = ''.join(lines)      #splits file into individual words
    words = wordsString.split()


    f = open(file, 'w')
    f.write("start")
    f.write(os.linesep)

    g = open(filewrite,'w')
    g.write("start")
    g.write(os.linesep)    

    for item in wordlist:        #multiple words
        matches = [i for i, w in enumerate(words) if w.lower().find(item) != -1] 
              #above line goes through lines, finds the specific words we want

        for m in matches:        #next three lines find each instance of the word, print out surrounding words
            list = []
            s = ""
            l = " ".join(words[m-20:m+1])
            j = 0
            while j < 20:
                list.append(words[m+i])
                j = j+1
                if words[m+i] == word1 or words[m+i] == word2:
                    j = 0
                    print (list)
            k = " ".join(list)

            f.write(f"{s}...{l}{k}...")          #writes the data to the external file
            f.write(os.linesep)
            g.write(str(m))
            g.write(os.linesep)
    f.close
    g.close
4

0 回答 0