克服循环行为的困惑:
names
每次使用时,该变量将是一个列表,其中只有一项。改为这样做:
import re
import csv
from pygoogle import pygoogle
names = []
with open('parse2.txt') as fin:
names = [x.strip() for x in fin.read().strip('\'"[]').split(' '*6)]
with open("output.txt") as fout:
for name in names:
g = pygoogle(name)
g.pages = 1
if (g.get_result_count()) == 0:
print "[Error]: could find no result for '{}'".format(name)
else:
fout.write("{} {} results\n".format(name, g.get_result_count()) )
一次写出文件
不覆盖以前的查询
您需要颠倒with
andfor
语句的顺序,这将打开文件一次:
with open("output.txt", "wb+") as f:
for line in lines:
# Stuff...
for name in names:
f.writelines(name)
或者,以附加模式打开文件:
for name in names:
with open("output.txt", "a") as f:
f.writelines(name)
在这种情况下,数据将被添加到最后。
转换数据
获得你想要的东西的步骤。
- 将您的原始列表转换为单词列表。
- 将列表分组成对。
- 写出对。
如下:
import re
from itertools import *
A = ["blah blah", "blah blah", "blah", "list"]
#
# from itertools doc page
#
def flatten(listOfLists):
"Flatten one level of nesting"
return list(chain.from_iterable(listOfLists))
def pairwise(t):
it = iter(t)
return izip(it,it)
#
# Transform data
#
list_of_lists = [re.split("[ ,]", item) for item in A]
# [['blah', 'blah'], ['blah', 'blah'], ['blah'], ['list']]
a_words = flatten(list_of_lists)
a_pairs = pairwise(a_words)
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(a_pairs)
更简洁地写成:
A_pairs = pairwise(flatten([re.split("[ ,]", item) for item in A]))
with open("output.csv", "wb") as f:
csv.writer(f).writerows(A_pairs)
以正确的格式书写
如果您不想在输出中使用逗号,只需为您定义一个自定义方言csvwriter
:
>>> csv.register_dialect('mydialect', delimiter=' ', quoting=csv.QUOTE_MINIMAL)
>>> csv.writer(open("try.csv", "w"), dialect="mydialect").writerows(a_ps)
这给了你想要的:
➤ cat try.csv
blah blah
blah blah
blah list