python - 使用 python (robobrowser) 下载图像和 pdf

Question

我正在使用 robobrowser 登录到受密码保护的网站。我能够下载 html 代码并对其进行编辑。但是，当我使用以下方法时：

br = RoboBrowser(history=True)
url = 'https://dummywebsite.html/dummy.pdf'
br.open(url)
pdf_file = '/localdir/local.pdf'
with open(pdf_file, 'wb') as output:
    output.write("%s" % (br.parsed))

但是，输出不是有效的 pdf 文件。当我尝试下载图像时也会发生同样的情况。我浏览了文档，但还没有找到任何东西。对此的替代方案似乎是机械化的。但是，没有对此的 python 3 支持。

如果有帮助或指点，我将不胜感激。此外，如果 robobrowser 无法处理此问题，任何其他替代方案都会有很大帮助。

score 2 · Accepted Answer

您可以尝试使用 RoboBrowser 也提供的 requests.session 对象：

url = "https://dummywebsite.html/dummy.pdf"
pdf_file_path = "/localdir/local.pdf"

browser = RoboBrowser(history=True)
# do the login (e.g. via a login form)
request = browser.session.get(url, stream=True)

with open(pdf_file_path, "wb") as pdf_file:
    pdf_file.write(request.content)

此方法还允许您访问仅在您登录后才可用的文件（此信息通常存储在 HTTP 会话中）。

score 1 · Accepted Answer

您必须将返回页面（PDF）的全部内容放入文件中。此代码应该可以工作：

br = RoboBrowser(history=True)
url = 'https://dummywebsite.html/dummy.pdf'
br.open(url)
pdf_file = '/localdir/local.pdf'

content = br.response.content

with open(pdf_file, "wb") as output:
  output.write(content)

python - 使用 python (robobrowser) 下载图像和 pdf

2 回答 2

Related

Reference