1

所以我的 CSV_Output 文件是空的,尽管我没有收到任何错误。我试图从我的 CSV_to_Read 文件中再添加一列。article.cleaned_text 的打印有效。所以我觉得我只是在这里做一些愚蠢的事情。谢谢!

from csv import reader, writer
import unicodecsv as csv
from goose import Goose

with open('CSV_to_Read.csv','r') as csvfile:
    readCSV = csv.reader(csvfile, encoding='utf-8')
    out = writer(open("CSV_Output.csv", "a"))
    for row in readCSV:
        g = Goose({'browser_user_agent': 'Mozilla', 'parser_class':'soup'})
        try:
            article = g.extract(url=row[0])
            print article.cleaned_text
            out.writerow([row[0], row[1], row[2], row[3], row[4], row[5], row[6], article.cleaned_text, row[7], row[8], row[9]])
        except Exception:
            pass
4

1 回答 1

0

在这里,您打开一个文件对象到您的输出文件,但不要关闭它。

out = writer(open("CSV_Output.csv", "a"))

数据可能已缓冲并且尚未刷新到磁盘。避免此错误的一种方法是确保关闭文件对象。后者由文件对象上下文管理器(即with open(path) as file:语法)为您处理。

因此,我建议将您的代码更改为:

with open('CSV_to_Read.csv','r') as csvfile:
    readCSV = csv.reader(csvfile, encoding='utf-8')
    with open("CSV_Output.csv", "a") as outfile:
        out = writer(outfile)
        for row in readCSV:
            g = Goose({'browser_user_agent': 'Mozilla', 'parser_class':'soup'})
            try:
                article = g.extract(url=row[0])
                print article.cleaned_text
                out.writerow([row[0], row[1], row[2], row[3], row[4], row[5], row[6], article.cleaned_text, row[7], row[8], row[9]])
            except Exception:
                pass
于 2018-02-08T00:29:03.957 回答