python - 如何使用 Python 从网页下载文本文件或一些对象？

Question

我正在编写一个函数，它从http://www.namejet.com/pages/downloads.aspx下载和存储今天的预发布域 .txt 文件列表。我正在尝试使用 json 来实现它。

import json
import requests

def hello():
    r = requests.get('http://www.namejet.com/pages/downloads.aspx') 
    #Replace with your website URL

    with open("a.txt", "w") as f: 
    #Replace with your file name
        for item in r.json or []:
            try:
                f.write(item['name']['name'] + "\n") 
            except KeyError: 
                pass  

hello()

我需要使用 python 下载包含预发布域的文件。我怎样才能做到这一点？上面的代码是正确的方法吗？

score 2 · Accepted Answer

I dont't think mechanize is much use for javascript, use selenium. Here's an example:

In [1]: from selenium import webdriver
In [2]: browser=webdriver.Chrome() # Select browser that you want to automate 
In [3]: browser.get('http://www.namejet.com/pages/downloads.aspx')
In [4]: element=browser.find_element_by_xpath(
            '//a[@id="ctl00_ContentPlaceHolder1_hlPreRelease1"]')

In [5]: element.click()

Now you can find prerelease_10-08-2012.txt in your download folder and you can open it in a usual way.

score 0 · Accepted Answer

我发现您的方法存在一些问题：

该页面不返回任何 json；因此，即使您成功访问该页面，r.json也将为空：

>>> import requests
>>> r = requests.get('http://www.namejet.com/pages/downloads.aspx')
>>> r.json

您所追求的文件隐藏在回发链接后面；您不能使用请求“执行”它，因为它不会理解 javascript。

鉴于上述情况，更好的方法是使用mechanize或替代方法来模拟浏览器。您也可以要求公司为您提供直接链接。

python - 如何使用 Python 从网页下载文本文件或一些对象？

2 回答 2

Related

Reference