python - 无法使用 python urllib.urlopen() 或 Shiretoko 以外的任何网络浏览器获取网站

Question

这是我要获取的网站的 URL

https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags

当我使用以下代码获取网站并使用以下代码显示内容时：

sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()

我得到以下输出：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
 <head>
  <title>
   Error message
  </title>
 </head>
 <body>
  <h2>
   Invalid input data
  </h2>
 </body>
</html>

我也得到与 urllib2 相同的结果。现在有趣的是，此 URL 仅适用于 Shiretoko Web 浏览器 v3.5.7。（当我说它有效时，我的意思是它给我带来了正确的页面）。当我将此 URL 输入 Firefox 3.0.15 或 Konqueror v4.2.2 时。我得到完全相同的错误页面（带有“无效输入数据”）。我不知道是什么造成了这种差异以及如何使用 Python 获取此页面。有任何想法吗？

谢谢

score 2 · Accepted Answer

如果您看到urllib2文档，它会说

urllib2.build_opener([handler, ...])¶

    .....
    If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added. 

    .....

您可以尝试将 urllib2 与ssl模块一起使用。或者，您可以使用httplib

score 0 · Accepted Answer

当您使用网络浏览器单击链接时，这正是您所得到的。也许您应该登录或设置cookie或其他东西

我在 linux 上收到了关于 firefox 3.5.8 (shiretoko) 的相同消息

python - 无法使用 python urllib.urlopen() 或 Shiretoko 以外的任何网络浏览器获取网站

2 回答 2

Related

Reference