我在我大学的一位教授的办公室工作,他让我通读整个班级的论文,试图抓住抄袭的人,所以我决定使用 python 编写一个程序,查看所有六个单词短语在所有论文中,并比较它们以查看是否有任何论文有超过 200 个匹配的短语。例如,这六个单词短语是......
我吃了一个土豆,很好吃。将会:
我吃了一个土豆,它
吃了一个土豆
一个土豆,很好。
我的代码目前是
import re
import glob
import os
def ReadFile(Filename):
try:
F = open(Filename)
F2=F.read()
except IOError:
print("Can't open file:",Filename)
return []
F3=re.sub("[^a-z ]","",F2.lower())
return F3
def listEm(BigString):
list1=[]
list1.extend(BigString.split(' '))
return list1
Name = input ('Name of folder? ')
Name2=[]
Name3=os.chdir("Documents")
for file in glob.glob("*txt"):
Name2.append(file)
for file in Name2:
index1=0
index2=6
new_list=[]
Words = ReadFile(file)
Words2= listEm(Words)
while index2 <= len(Words2):
new_list.append(Words2[index1:index2])
index1 += 1
index2 += 1
del Name2[0] ##Deletes first file from list of files so program wont compare the same file to itself.
for file2 in Name2:
index=0
index1=6
new_list2=[]
Words1= ReadFile(file2)
Words3= listEm(Words)
while index1 <= len(Words3):
new_list2.append(Words3[index:index1]) ##memory error
index+=1
index2+=1
results=[]
for element in new_list:
if element in new_list2:
results.append(element)
if len(results) >= 200:
print("You may want to examine the following files:",file1,"and",file2)
我收到内存错误
new_list2.append(Words3[index:index1])
出于某种原因,我无法弄清楚我做错了什么,在我短暂的一学期编程生涯中,我从未收到过内存错误。感谢您的任何帮助。