我是编程新手,我正在运行这个脚本来清理一个大文本文件(超过 12000 行)并将其写入另一个 .txt 文件。问题是当使用较小的文件(大约 500 行左右)运行它时,它执行得很快,因此我的结论是,由于文件的大小,它需要时间。因此,如果有人可以指导我使这段代码高效,我们将不胜感激。
input_file = open('bNEG.txt', 'rt', encoding='utf-8')
l_p = LanguageProcessing()
sentences=[]
for lines in input_file.readlines():
tokeniz = l_p.tokeniz(lines)
cleaned_url = l_p.clean_URL(tokeniz)
remove_words = l_p.remove_non_englishwords(cleaned_url)
stopwords_removed = l_p.remove_stopwords(remove_words)
cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n"
output_file = open('cNEG.txt', 'w', encoding='utf-8')
sentences.append(cleaned_sentence)
output_file.writelines(sentences)
input_file.close()
output_file.close()
编辑:下面是答案中提到的更正代码,几乎没有其他更改以满足我的要求
input_file = open('chromehistory_log.txt', 'rt', encoding='utf-8')
output_file = open('dNEG.txt', 'w', encoding='utf-8')
l_p = LanguageProcessing()
#sentences=[]
for lines in input_file.readlines():
#print(lines)
tokeniz = l_p.tokeniz(lines)
cleaned_url = l_p.clean_URL(tokeniz)
remove_words = l_p.remove_non_englishwords(cleaned_url)
stopwords_removed = l_p.remove_stopwords(remove_words)
#print(stopwords_removed)
if stopwords_removed==[]:
continue
else:
cleaned_sentence=' '.join(str(s) for s in stopwords_removed)+"\n"
#sentences.append(cleaned_sentence)
output_file.writelines(cleaned_sentence)
input_file.close()
output_file.close()