python - urllib2 在 firefox 获取代码 200 的站点上获取 http 代码 404

Question

我正在尝试使用 urllib2 从内部网站上抓取数据。当我跑

try:
    resp = urllib2.urlopen(urlBase)
    data = resp.read()
except HTTPError as e1:
    print("HTTP Error %d trying to reach %s" % (e1.code, urlBase))
except URLError as e2:
    print("URLError %d" % e2.code)
    print(e2.read())

我收到一个 HTTPError，e1.code 为 404。如果我在 Firefox 上导航到该站点并使用开发人员工具，我会看到一个 HTTP 代码 200。有人知道问题可能是什么吗？

编辑 1在调用它之前，我还安装了一个空的代理处理程序，因此 urllib2 不会尝试使用我的 shell 设置的代理设置：

handler = urllib2.ProxyHandler({})
opener = urllib2.build_opener(handler)
urllib2.intall_opener(opener)

编辑 2 FWIW 我要导航到的 url 是 apache 索引，而不是 html 文档。但是，Firefox 读取的状态代码仍然是 HTTP/1.1 Status 200

score 0 · Accepted Answer

Turns out a function inside the try I stripped out was trying to access another page that was triggering the 404 error.

score 0 · Accepted Answer

在我使用像Charles这样的 HTTP 代理后，有时会发生这种情况。就我而言，修复只是打开和关闭 HTTP 代理。

python - urllib2 在 firefox 获取代码 200 的站点上获取 http 代码 404

2 回答 2

Related

Reference