python - Python Curl writefunction 在第二次调用时不起作用

Question

我用 Python 编写了一个简单的脚本。

它解析来自网页的超链接，然后检索这些链接以解析一些信息。

我有类似的脚本运行并重新使用 writefunction 没有任何问题，由于某种原因它失败了，我不知道为什么。

一般卷曲初始化：

storage = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.USERAGENT, USER_AGENT)
c.setopt(pycurl.COOKIEFILE, "")
c.setopt(pycurl.POST, 0)
c.setopt(pycurl.FOLLOWLOCATION, 1)
#Similar scripts are working this way, why this script not?
c.setopt(c.WRITEFUNCTION, storage.write)

第一次调用检索链接：

URL = "http://whatever"
REFERER = URL

c.setopt(pycurl.URL, URL)
c.setopt(pycurl.REFERER, REFERER)
c.perform()

#Write page to file
content = storage.getvalue()
f = open("updates.html", "w")
f.writelines(content)
f.close()
... Here the magic happens and links are extracted ...

现在循环这些链接：

for i, member in enumerate(urls):
    URL = urls[i]
    print "url:", URL
    c.setopt(pycurl.URL, URL)
    c.perform()

    #Write page to file
    #Still the data from previous!
    content = storage.getvalue()
    f = open("update.html", "w")
    f.writelines(content)
    f.close()
    #print content
    ... Gather some information ...
    ... Close objects etc ...

score 0 · Accepted Answer

如果您想按顺序将 url 下载到不同的文件（无并发连接）：

for i, url in enumerate(urls):
    c.setopt(pycurl.URL, url)
    with open("output%d.html" % i, "w") as f:
        c.setopt(c.WRITEDATA, f) # c.setopt(c.WRITEFUNCTION, f.write) also works
        c.perform()

笔记：

storage.getvalue()返回storage从创建那一刻起写入的所有内容。在您的情况下，您应该在其中找到多个 url 的输出
open(filename, "w") 覆盖文件（以前的内容消失了），即update.html 包含循环content的最后一次迭代中的任何内容

python - Python Curl writefunction 在第二次调用时不起作用

1 回答 1

Related

Reference