1

我正在使用 python 打开带有以下代码的 URL,有时我会收到此错误:

从 urllib 导入 urlopen url = "http://www.gutenberg.org/files/2554/2554.txt" 原始 = urlopen(url).read()

错误:'\n\n403 禁止\n\n

禁止的

\n

您无权访问 /files/2554/2554.txt\non 此服务器。

\n
\nApache 服务器位于 www.gutenberg.org 端口 80\n\n'

这是什么?

谢谢

4

1 回答 1

2

This is the web page blocking Python access as it is making requests with the header 'User-Agent'.

To get around this, download the 'urllib2' module and use this code:

req = urllib2.Request(url, headers ={'User-Agent':'Chrome'})
raw = urllib2.urlopen(req).read()

You are know accessing the site with the header 'Chrome' and should no longer be forbidden (I tried it myself and it worked).

Hope this helps.

于 2013-03-19T13:58:26.937 回答