0

我在使用 python 请求库下载网页时遇到问题。代码工作正常,直到它突然停止工作。这是代码:

import requests
user_agent = {'User-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17'}
url = 'http://maya.tase.co.il/bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press='
r  = requests.get(url, headers = user_agent)
r.text

我得到某种与 CRC 相关的响应:

u'\r\n\r\n\r\n\r\nfunction str_reverse(in_str) { return ((in_str.split("")).reverse()).join(""); }\r\nfunction test(){var table = " charCodeAt(0);\r\nvar arr = new Array(n);\r\nvar m = Math.pow(((end - start) + 1),n);\r\nfor (var i=0; i=0;--j) {\r\nvar t = arr[j].charCodeAt(0);\r\nt++; arr[j] = String.fromCharCode(t);\r\nif (arr[j].charCodeAt(0)<=end) {\r\nbreak;} else { arr[j] = s1 ;}}\r \nvar chlg = arr.join(""); var str = chlg + slt;\r\nvar crc = 0;\r\nvar crc = crc ^ (-1);\r\nfor( var k = 0, iTop = str.length; k < iTop; k++ ) { crc = (crc >> 8) ^ ("0x" + table.substr(((crc ^ str.charCodeAt(k) ) & 0x000000FF) * 9, 8));}\r\ncrc = crc ^ (- 1);\r\ncrc = Math.abs(crc);\r\nif (crc == parseInt(c)){break;}}\r\ndocument.cookie = "TSe8eecf_75=" + "679f30133acfc6e01dcf7ef59af22666:" + chlg + ":" + slt + ":" + crc + ";Max-Age=3600;path=/";\r\ndocument.forms[0].elements[2]。

当尝试使用与 chrom 页面下载相同的 url 时没有问题,所以它不是 IP 或用户代理被阻止..

可能是什么问题呢?

4

2 回答 2

3

该页面的工作方式是,当对 URL 发出 GET 请求时,会返回一个 javascript 片段,并且 js 片段会向服务器发出 POST 请求。查看“Live HTTP Headers”firefox 扩展捕获的 http 标头的输出。

http://maya.tase.co.il/bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press=

GET /bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press= HTTP/1.1
Host: maya.tase.co.il
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: LBMaya=2; TS7c8b94=21efb14c7892000e308875477ef97a44a4e4ec0c353c57aa511a0278; TSe8eecf=577463c4dff39a4cd00ac387822be6f8a4e4ec0c353c57aa511a06039dbef3f4669e6cc2; ASPSESSIONIDAQRASAAB=APJODHLAKLHBLDBFFIJFBMKD; __utma=204212108.1708073596.1360658775.1360658775.1360658775.1; __utmb=204212108; __utmc=204212108; __utmz=204212108.1360658775.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); TSe8eecf_75=607adbfcad09964b707d73029eb6c75e:ywvw:I5Krl8W8:448224545; TSe8eecf_31=52f39e84b867da063487356c8141f112a4e4ec0c353c57aa000000000000000000fe8ffe0cfdc0fd2efc81fc6ffb87fafffa28f6def59df515f4baf390ef06ede6ed08eca7ec49eba1ea0ee757e533e49ce375dff9df7dded2de56dd9ddd87d93ad895d7a8d607d5cccfdfcfb5cf5bcef4ce70ce1acc14cc0ecb3ac91cc8b3c78ec621c445c1dfbbe4bb0abaa5ba4bb331b29eb1d7b1c8b0faad85ad6bacc4ac2aabc2ab2caa83aa6da317a2b8a1739ff09f1e9eb19e5f99b7995998f6981893e8918c90238fd68f388f0b8e978e798c678991897f88d0883e826181aa80057a967a8c72dd718870276b1f6b05635461ae60015be05a4f599e53db527451cf4bc64a69481743fd42524046362734f931eb30442f5f271326cd257021cd206217eb066202cf

HTTP/1.1 200 OK
Content-Length: 4323
Pragma: no-cache
Date: Tue, 12 Feb 2013 09:02:43 GMT
Connection: keep-alive
----------------------------------------------------------
http://maya.tase.co.il/bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press=

POST /bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press= HTTP/1.1
Host: maya.tase.co.il
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://maya.tase.co.il/bursa/index.asp?view=search&group_first_level=1&company_group=3000&arg_comp=&srh_comp_lb=1349&srh_from=2013-2-2&srh_until=2013-2-12&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press=
Cookie: LBMaya=2; TS7c8b94=21efb14c7892000e308875477ef97a44a4e4ec0c353c57aa511a0278; TSe8eecf=577463c4dff39a4cd00ac387822be6f8a4e4ec0c353c57aa511a06039dbef3f4669e6cc2; ASPSESSIONIDAQRASAAB=APJODHLAKLHBLDBFFIJFBMKD; __utma=204212108.1708073596.1360658775.1360658775.1360658775.1; __utmb=204212108; __utmc=204212108; __utmz=204212108.1360658775.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); TSe8eecf_75=b1977e80a3c5a7cb63469954c8ac6e05:lmnj:yjv3396F:69895458; TSe8eecf_31=52f39e84b867da063487356c8141f112a4e4ec0c353c57aa000000000000000000fe8ffe0cfdc0fd2efc81fc6ffb87fafffa28f6def59df515f4baf390ef06ede6ed08eca7ec49eba1ea0ee757e533e49ce375dff9df7dded2de56dd9ddd87d93ad895d7a8d607d5cccfdfcfb5cf5bcef4ce70ce1acc14cc0ecb3ac91cc8b3c78ec621c445c1dfbbe4bb0abaa5ba4bb331b29eb1d7b1c8b0faad85ad6bacc4ac2aabc2ab2caa83aa6da317a2b8a1739ff09f1e9eb19e5f99b7995998f6981893e8918c90238fd68f388f0b8e978e798c678991897f88d0883e826181aa80057a967a8c72dd718870276b1f6b05635461ae60015be05a4f599e53db527451cf4bc64a69481743fd42524046362734f931eb30442f5f271326cd257021cd206217eb066202cf
Content-Type: application/x-www-form-urlencoded
Content-Length: 69
TSe8eecf_id=1&TSe8eecf_md=1&TSe8eecf_rf=0&TSe8eecf_ct=0&TSe8eecf_pd=0
HTTP/1.1 200 OK
Content-Type: text/html
X-Maya: 2
Vary: Accept-Encoding
X-Powered-By: ASP.NET
Content-Encoding: gzip
Date: Tue, 12 Feb 2013 09:02:45 GMT
Content-Length: 10652
Connection: keep-alive
于 2013-02-12T09:06:50.107 回答
2

该页面可能包含用于加载真实数据的 ajax/javascript 代码。将输出写入文件并使用浏览器/文本编辑器打开文件。

于 2013-02-12T08:47:50.360 回答