嗨,我正在尝试从此 URL 中删除一些数据:
http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1
您可能已经注意到,如果尚未设置 cookie 和会话数据,您将被重定向到其基本 url ( http://www.21cineplex.com/ )
我试着这样做:
def main():
try:
cj = CookieJar()
baseurl = "http://www.21cineplex.com"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(baseurl)
urllib2.install_opener(opener)
movieSource = urllib2.urlopen('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1').read()
splitSource = re.findall(r'<ul class="w462">(.*?)</ul>', movieSource)
print splitSource
except Exception, e:
str(e)
print "Error occured in main Block"
但是,我最终未能从该特定 URL 中删除。
快速检查显示该网站正在设置会话 ID (PHPSESSID) 并复制到客户端的 cookie 中。
问题是我如何减轻这样的例子?
ps:我尝试安装请求(通过pip)它给了我(404):
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Getting page https://pypi.python.org/simple/
URLs to search for versions for request:
* https://pypi.python.org/simple/request/
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Could not find any downloads that satisfy the requirement request
Cleaning up...