python - 如何使用 Python 抓取 Twitter 页面？

Question

当我尝试使用此代码抓取 Twitter 时：

import urllib2
s = "https://mobile.twitter.com/bing/"
html = urllib2.urlopen(s).read()
print html

...我收到以下错误：

Traceback (most recent call last):
  File "C:\Users\arpit\Downloads\Desktop\Wiki Code\final Crawler_wiki.py", line 14, in <module>
    html = urllib2.urlopen(s).read()
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 418, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "C:\Python27\lib\urllib2.py", line 1177, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

如果我替换mobile.twitter.com为twitter.comthen 它可以工作，但我希望它可以与mobile.twitter.com.

score 0 · Accepted Answer

twitter 站点可能正在寻找您在通过 urllib api 发出请求时未设置的用户代理。

您可能需要使用mechanize之类的东西来伪造您的用户代理。

但我强烈建议您使用twitter api，它提供了许多简单而出色的数据处理方式。

python - 如何使用 Python 抓取 Twitter 页面？

1 回答 1

Related

Reference