0

我需要这个脚本的结果,即 pygoogle 搜索结果,如下所示:

name    # of results
name    # of results
name    # of results

这是我到目前为止所拥有的,如何在不每次都重新写入文件的情况下做到这一点:

import re
import pygoogle
import csv
from pygoogle import pygoogle
#creates list
with open('parse2.txt') as f:
    lines = [x.strip() for x in f.read().strip('\'"[]').split(' '*6)]
#googles each name in list
for line in lines:
    g = pygoogle(line)
    g.pages = 1
    names = [line + "    " + "%s results" %(g.get_result_count())]
    if (g.get_result_count()) == 0:
        print "ERROR. SEARCH NOT SUCCSESSFUL. TRY AGAIN IN A FEW MINUTES."
    elif (g.get_result_count()) > 0:
    print names
    for name in names:
        with open("output.txt", "wb+") as f:
            f.writelines(name)

当我运行脚本时,输出只显示最近的一个,因为它正在重写脚本:

4

3 回答 3

1

克服循环行为的困惑:

names每次使用时,该变量将是一个列表,其中只有一项。改为这样做:

import re
import csv
from pygoogle import pygoogle

names = []

with open('parse2.txt') as fin:
   names = [x.strip() for x in fin.read().strip('\'"[]').split(' '*6)]

with open("output.txt") as fout:
  for name in names:
    g = pygoogle(name)
    g.pages = 1
    if (g.get_result_count()) == 0:
      print "[Error]: could find no result for '{}'".format(name)
    else:
      fout.write("{}    {} results\n".format(name, g.get_result_count()) )

一次写出文件

不覆盖以前的查询

您需要颠倒withandfor语句的顺序,这将打开文件一次:

with open("output.txt", "wb+") as f:
  for line in lines:
    # Stuff...
    for name in names:
      f.writelines(name)

或者,以附加模式打开文件:

for name in names:
    with open("output.txt", "a") as f:
        f.writelines(name)

在这种情况下,数据将被添加到最后。

转换数据

获得你想要的东西的步骤。

  1. 将您的原始列表转换为单词列表。
  2. 将列表分组成对。
  3. 写出对。

如下:

import re
from itertools import *

A = ["blah blah", "blah blah", "blah", "list"]

#
# from itertools doc page
#
def flatten(listOfLists):
  "Flatten one level of nesting"
  return list(chain.from_iterable(listOfLists))

def pairwise(t):
  it = iter(t)
  return izip(it,it)

#
# Transform data
#
list_of_lists = [re.split("[ ,]", item) for item in A]
# [['blah', 'blah'], ['blah', 'blah'], ['blah'], ['list']]
a_words = flatten(list_of_lists)
a_pairs = pairwise(a_words)

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a_pairs)

更简洁地写成:

A_pairs = pairwise(flatten([re.split("[ ,]", item) for item in A]))
with open("output.csv", "wb") as f:
    csv.writer(f).writerows(A_pairs)

以正确的格式书写

如果您不想在输出中使用逗号,只需为您定义一个自定义方言csvwriter

>>> csv.register_dialect('mydialect', delimiter=' ', quoting=csv.QUOTE_MINIMAL)
>>> csv.writer(open("try.csv", "w"), dialect="mydialect").writerows(a_ps)

这给了你想要的:

➤ cat try.csv 
blah blah
blah blah
blah list
于 2013-06-24T13:51:19.763 回答
0

要写入附加到文件而不重写,请添加+到模式:

for name in names:
    with open("output.txt", "wb+") as f:
        writer = csv.writer(f)
        writer.writerows(A)

另一方面,为了提高效率,您可以只打开一次文件并使用文件方法而不是 CSV 模块:

with open("output.txt", "wb+") as f:
    f.writelines(A)
于 2013-06-24T13:45:28.553 回答
0

像这样的东西:

>>> import csv
>>> A = ["blah blah", "blah blah", "blah", "list"]
>>> lis = [y for x in A for y in x.split()]
>>> lis
['blah', 'blah', 'blah', 'blah', 'blah', 'list']
>>> it = iter(lis)
>>> with open("output.csv", "wb") as f:
         writer = csv.writer(f, delimiter=' ')
         writer.writerows([ [x,next(it)] for x in it])
于 2013-06-24T13:45:47.597 回答