我正在编写一个脚本,它将自动从 hostuje.net 页面下载文件。我决定使用 Requests 模块来做到这一点。下载过程如下:
- 当您访问该站点时,它会在 cookie 中提供 PHPSESSID
- 在页面上有一个带有隐藏输入的表单:文件 ID、哈希、文件名、服务器地址、mime 和“REG”(设置为 1 表示您接受使用条款)
- 当您单击“Pobierz plik”(下载)时,会发送带有提及数据的帖子
- 页面正在重新加载并显示下载对话框
我已经阅读了 http 标头并决定通过简单的脚本发送类似的查询。我使用 pyQuery 来获取输入值。问题是,每当我将 postdata 发送到服务器时,都没有 302(找到)代码,而是 200,并且页面告诉我的会话已过期加载。这是我的代码:
from pyquery import PyQuery as pq
import requests
(some code here...)
session = requests.session()
page = session.get(download_page)
d=pq(page.content, parser='html_fragments')
data = {}
for inputs in d("td > input[name]").items():
data[inputs.attr("name")]=inputs.attr("value")
cookies = dict(PHPSESSID=page.cookies['PHPSESSID'])
resp = session.post(download_page, data=data, allow_redirects=False,stream=False, cookies=cookies)
Web 浏览器发送这种 postdata:
POST /file.php?id=c4f50ccefcd1c83e2a48374db4644a0a HTTP/1.1
Host: hostuje.net
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: pl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Cookie: PHPSESSID=g565im2lqu0csthglgnlb98tp2
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 164
OK=1&hasz=c4f50ccefcd1c83e2a48374db4644a0a&id=c4f50ccefcd1c83e2a48374db4644a0a&name=Stupid+Fresh-+Spam+Javelin+Original.mp3&mime=audio%2Fmpeg&k=n4.hostuje.org®=1
响应如下:
HTTP/1.1 302 Found
Date: Wed, 03 Apr 2013 14:42:58 GMT
Server: Apache/2
X-Powered-By: PHP/5.3.3-7+squeeze14
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Last-Modified: Wed, 03 Apr 2013 16:42:58 +0200
Location: http://n4.hostuje.org/file.php?id=c4f50ccefcd1c83e2a48374db4644a0a&mime=audio/mpeg&name=Stupid+Fresh-+Spam+Javelin+Original.mp3&sid=eca83088d8cad5bcb194c92e3a5af068
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 26
Connection: close
Content-Type: text/html; charset=ISO-8859-2
这就是我想用我的脚本实现的目标。
<<class 'requests.cookies.RequestsCookieJar'>[<Cookie PHPSESSID=c0v7cdfa0vkntbfdl8cocr01r4 for hostuje.net/>]>
{'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '/', 'User-Agent': 'python-requests/1.2.0 CPython/2.7.3 Linux/3.2.0-39-generic'}
{'OK': '1', 'name': 'Stupid Fresh- Spam Javelin Original.mp3', 'k': 'n4.hostuje.org', 'id': 'c4f50ccefcd1c83e2a48374db4644a0a', 'mime': 'audio/mpeg', 'hasz': 'c4f50ccefcd1c83e2a48374db4644a0a', 'REG': '1'}
它们是 session.cookie、session.headers 和数据字典
附加信息:响应标头
{'content-length': '5337', 'content-encoding': 'gzip', 'expires': 'Mon, 26 Jul 1997 05:00:00 GMT', 'vary': 'Accept-Encoding', 'server': 'Apache/2', 'last-modified': 'Thu, 04 Apr 2013 19:36:58 +0200', 'connection': 'close', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'date': 'Thu, 04 Apr 2013 17:36:58 GMT', 'x-powered-by': 'PHP/5.3.3-7+squeeze14', 'content-type': 'text/html; charset=ISO-8859-2'}
响应内容(因为响应没有方法“正文”)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2">
<meta http-equiv="EXPIRES" content="0">
<meta name="RESOURCE-TYPE" content="DOCUMENT">
<meta name="DISTRIBUTION" content="GLOBAL">
<meta name="AUTHOR" content="TS TEAM">
<meta name="COPYRIGHT" content="Copyright (c) 2008 by TS TEAM">
<meta name="KEYWORDS" content="darmowy hosting, hosting plik�w, darmowy hosting plik�w, bezp�atny hosting, hosting obrazk�w, hosting zdj��, darmowy, hosting mp3, ftp hosting, darmowy serwer">
<meta name="DESCRIPTION" content="Bezp�atny hosting plik�w, obrazk�w i zdj�� bez konieczno�ci ponoszenia dodatkowych op�at. Pozwalamy na darmowe dzielenie si� plikami z innymi u�ytkownikami internetu.">
<meta name="ROBOTS" content="ALL, FOLLOW">
<meta http-equiv="CONTENT-Language" content="pl">
<meta name="REVISIT-AFTER" content="1 DAYS">
<meta name="RATING" content="GENERAL">
<meta name="GENERATOR" content="TS TEAM">
<title>Darmowy hosting plik�w, zdj�� i obrazk�w - Hostuje.net</title>
<link rel="stylesheet" href="style.css?new8ac" type="text/css" media="all">
<!--[if lte IE 6]> <link rel="stylesheet" href="style_ie.css?new2" type="text/css"/><![endif]-->
<script type="text/javascript" src="swfobject2.js?sr"></script>
</head>
<body><a name="pagetop"></a>
<center> <!-- top -->
<div id="top">
<a id="top_logo" href="index.php"></a>
<ul id="menu_ul">
<li id="menu_li_home"><a id="menu_home" href="index.php">Home</a></li>
<li id="menu_li_rejestracja"><a id="menu_rejestracja" href="regulamin.php">Regulamin</a></li>
<li id="menu_li_pomoc"><a id="menu_pomoc" href="pomoc.php">Pomoc</a></li>
<li id="menu_li_onas"><a id="menu_onas" href="onas.php">O Nas</a></li>
<li id="menu_li_kontakt"><a id="menu_kontakt" href="kontakt.php">Kontakt</a></li>
</ul>
</div>
<!--/top -->
<!-- mde -->
<div class="mde">
<!-- mdel -->
<div class="mdel">
<div id="register">
<br>
<div><b style="font-size: 26px">Darmowy</b> <b style="color: #CA2121; font-size: 26px">hosting</b> <b style="font-size: 26px">plik�w</b></div>
<table id="register_tab" width="348" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="174"><p><b style="font-size:14px;">- Upload do 250</b> <b style="font-size:14px;">MB</b></p></td>
<td width="174"><p><b style="font-size:14px;">- Szybkie Serwery</b></p></td>
</tr>
<tr>
<td><p style="margin-top:5px;"><b style="font-size:14px;">- Brak Limit�w</b></p></td>
<td><p style="margin-top:5px;"><b style="font-size:14px;">- Nielimitowana przestrze� na pliki</b></p></td>
</tr>
<tr>
<td><p style="margin-top:5px;"><b style="font-size:14px;">- Wszystko za Darmo</b></p></td><td><p style="margin-top:5px;"><b style="font-size:14px;">- Nielimitowany transfer</b></p></td> </tr>
</table>
</div>
</div>
<!--/mdel -->
<!-- mder -->
<div class="mder">
<div id="login">
<form id="forma" name="forma" method="post" action="login.php">
<label id="forma_label_title">Logowanie do e-Dysku</label>
<input type="text" name="login" id="forma_login" value="" style="height:19px;">
<input type="password" name="pass" id="forma_pwd" value="" style="height:19px;">
<label id="forma_label_login">Login:</label>
<label id="forma_label_pwd">Has�o:</label>
<label id="forma_label_checkbox">Zapami�taj</label>
<label id="forma_label_remember"><a href="rejestracja.php">Za��� konto</a> | <a href="przypomnij.php">Przypomnij has�o</a></label>
<input type="checkbox" name="zapisz" id="forma_checkbox" value="1">
<input name="forma_button_login" id="formm_button_login" type="image" src="graph/home/login/forma_button.gif" value="" onmouseover="this.style.cursor='pointer';">
</form>
</div>
</div>
<!--/mder -->
</div>
<!--/mde -->
<br>
<!-- BBelements AdServer one2many Position: go.arbopl.bbelements.com hostuje.net(5362) / pozostale podstrony(2) / billboard(1) / billboard 750x100 750x100(30)-->
<!-- HowTo: Place this code into the html page there the ad should appear. -->
<div id='bmone2n-5362.2.1.30'></div>
<div align="center">
<br>
<table width="100%"><tr><td align="center"><table width="700"><tr><td align="left" valign="top" style="width: 420px"><b>Wyst�pi�y nast�puj�ce b��dy:</b><ul><li style="margin-right: 4px;">Twoja sesja wygas�a, prosz� klikn�� ponownie na przycisk pobierz plik lub od�wie�y� strone.</li></ul><form method="post" action="http://hostuje.net/file.php?id=c4f50ccefcd1c83e2a48374db4644a0a"><div><img style="width:0px;height:0px;" src="obraz.php" alt="" title=""><table cellpadding="4" cellspacing="4">
<tr><td colspan="2">
<div id='bmone2n-5362.2.1.36'></div>
</td></tr>
<tr><td colspan="2" style="border-top: 2px solid #e5e5e5;"><input type="hidden" name="OK" value="1"><input type="hidden" name="hasz" value="c4f50ccefcd1c83e2a48374db4644a0a"><input type="hidden" name="id" value="c4f50ccefcd1c83e2a48374db4644a0a"><input type="hidden" name="name" value="Stupid Fresh- Spam Javelin Original.mp3"><input type="hidden" name="mime" value="audio/mpeg"><input type="hidden" name="k" value="n4.hostuje.org"><input type="hidden" name="REG" value="1"><input type="submit" value="Pobierz Plik" style="font-size: 22px; border: 2px solid #eeeeee; cursor: hand;"></td></tr></table></div></form></td><td valign="top" align="left" style="border-top: 2px solid #e5e5e5; border-left: 2px solid #e5e5e5; padding: 8px; width: 280px; background: #ffffff url(images/tlo_2a.gif) repeat-y;"><div style="font-size: 18px;"><span style="font-size: 18px; border-bottom: 2px solid #eeeeee;">Informacje o pliku</span></div><br><b>Plik:</b> Stupid Fresh- Spam Javelin Original.mp3<br><b>Rozmiar:</b> 293 KB<br><b>Rozmiar kontrolny w bajtach:</b> 299 676<br><b>Typ:</b> audio/mpeg<br><script type="text/javascript" src="swfobject.js"></script><br> <script type="text/javascript" charset="iso-8859-2">if (document.getElementById('adKontekst_0')==null) {
var adKontekst_pola=new Array;
document.write('<scr'+'ipt type="text/javascript" charset="iso-8859-2" '+
'src="http://adsearch.adkontekst.pl/akon/spliter?prid=5343&caid=94710&ns='
+(new Date()).getTime()+'"'+'></'+'scri'+'pt>');
}
var adc_i=adKontekst_pola.length;
document.write("<div id='adKontekst_"+adc_i+"'> </"+"div>");
adKontekst_pola[adc_i]=new Object();
adKontekst_pola[adc_i].nazwa="adKontekst_"+adc_i;
adKontekst_pola[adc_i].typ=201;
adKontekst_pola[adc_i].r=1;
adKontekst_pola[adc_i].c=1;
adKontekst_pola[adc_i].x=234;
adKontekst_pola[adc_i].y=60;
adKontekst_pola[adc_i].naroznik_lewy_gorny = 3; adKontekst_pola[adc_i].naroznik_prawy_gorny = 3; adKontekst_pola[adc_i].naroznik_lewy_dolny = 3; adKontekst_pola[adc_i].naroznik_prawy_dolny = 3; adKontekst_pola[adc_i].spacing=1; adKontekst_pola[adc_i].scalenie=2;
adKontekst_pola[adc_i].paleta=new Object();
adKontekst_pola[adc_i].paleta.kolor_tlo="#FFFFFF";
adKontekst_pola[adc_i].paleta.kolor_tytul="#414e5b";
adKontekst_pola[adc_i].paleta.kolor_opis="#414e5b";
adKontekst_pola[adc_i].paleta.kolor_url="#414e5b";
adKontekst_pola[adc_i].paleta.kolor_naglowek="#FFFFFF";
adKontekst_pola[adc_i].paleta.kolor_tlo_naglowka="#FFFFFF";
adKontekst_pola[adc_i].paleta.kolor_ramki_naglowka="#FFFFFF";
adKontekst_pola[adc_i].metka="text";
adKontekst_pola[adc_i].czy_url=true;
adKontekst_pola[adc_i].id_koloru_metki="1";
adKontekst_pola[adc_i].nform=8;
</script>
<script type="text/javascript"
src="http://hostuje.net/show_s.php?1365097018&h=c4f50ccefcd1c83e2a48374db4644a0a">
</script>
<div id="flashcontent">Do poprawnego dzia�ania playera wymagany jest flash i w��czona obs�uga javascriptu.</div><script type="text/javascript">var so = new SWFObject("mediaplayer.swf", "player", "250", "20", "8", "#ffffff");so.addParam("allowscriptaccess", "always");so.addVariable("file", "http://n4.hostuje.org/_tmp/id-c4f50ccefcd1c83e2a48374db4644a0a-p-1");so.addVariable("width", "250");so.addVariable("height", "20"); so.addVariable("bufferlength", "5"); so.addVariable("displaywidth", "0");so.addVariable("displayheight", "0");so.addVariable("autostart", "false"); so.addVariable("frontcolor", "0x414e5b");so.addVariable("backcolor", "0xffffff");so.addVariable("lightcolor", "0xCA2121"); so.addVariable("type", "mp3");so.write("flashcontent");</script>Po wci�ni�ciu play poczekaj chwil� a� plik si� wczyta...<br><br><br><div style="padding-top:2px;"><p style="padding-top:2px;"><img src="images/raport.png" style="vertical-align: middle;" border="0" width="16" height="16" alt=""> <b><a href="kontakt.php" style="text-decoration:underline;">Zg�o� naruszenie zasad / Raport Abuse</a></b><br></p><p style="padding-top:2px;"><img src="images/dodatki.png" style="vertical-align: middle;" border="0" width="16" height="16" alt=""> <b><a href="#pagetop" onclick="show('okno')" style="text-decoration:underline;">Kody i dodatki html</a></b><br></p><p style="padding-top:2px;"><img src="images/komentarz.png" style="vertical-align: middle;" border="0" width="16" height="16" alt=""> <b><a href="#pagetop" onclick="show('okno2')" style="text-decoration:underline;">Dodaj komentarz</a></b></p></div><br><hr style="border: 0;color: #f2f2f2;background-color: #f2f2f2;height: 1px;"><b>Ten plik zosta� przeskanowany programem antywirusowym. Skaner antywirusowy log:</b><br>Czysty plik, nie wykryto zagro�e�<script language="javascript" type="text/javascript">
<!--
document.title = "Stupid Fresh- Spam Javelin Original.mp3 : Darmowy hosting plik�w - Hostuje.net";
//-->
</script></td></tr></table></td></tr></table>
</div>
<br>
<!-- footer -->
<div id="footer_line"></div>
<div id="footer">
<!-- r -->
<p align="center">
Korzystanie z serwisu jest r�wnoznaczne z akceptacj� <a class="link_red" href="regulamin.php" target="_self">regulaminu</a><span class="footer_sep">|</span> <a class="link_red" href="polityka_cookies.php" target="_self">Polityka cookies</a><span class="footer_sep">|</span> <a class="link_red" href="kontakt.php" target="_self">Kontakt / Raport Abuse</a><span class="footer_sep">|</span> <a class="link_black" href="reklama.php" target="_self">Reklama</a><span class="footer_sep">|</span> </p>
<p class="color_grey" align="center" style="margin-top:3px; font-size:10px;">Copyright by Hostuje.net © 2008-2013 All rights reserved | powered by TS Team ver.1.4.8 | <a style="font-size:10px;" href="http://hostuje.net/cookies_del.php">Usu� ciasteczka z hostuje.net</a>
</p>
</div>
<!--/footer -->
<!-- (C) 2000-2006 Gemius SA - gemiusAudience / ver 11.1 / pp.hostuje.net / podstrony-->
<script type="text/javascript">
<!--//--><![CDATA[//><!--
var pp_gemius_identifier = new String('ctTqJgtuY5.LA18hzSqwk8bfjwiwbFtPWF8Iwxo_IGz.Q7');
//--><!]]>
</script>
<script type="text/javascript" src="http://arbo.hit.gemius.pl/pp_gemius.js"></script>
<div id="google_translate_element"></div><script type="text/javascript">
function googleTranslateElementInit() {
new google.translate.TranslateElement({
pageLanguage: 'pl'
}, 'google_translate_element');
}
</script><script type="text/javascript" src="http://translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"></script>
<!-- BBelements AdServer one2many code -->
<!-- HowTo: All the ad codes must be above this code! -->
<!-- HowTo: It is good idea to place this code just below the last ad code. -->
<script type='text/javascript' charset='utf-8' src='http://go.arbopl.bbelements.com/bb/bb_one2n.js'></script>
<script type='text/javascript'>/* <![CDATA[ */
bmone2n.addPosition('5362.2.1.30','');
bmone2n.addPosition('5362.2.1.36','');
bmone2n.addPosition('5362.2.1.31','');
bmone2n.getAd('go.arbopl.bbelements.com','',''+(typeof(bburlparam)=='string'?'&'+bburlparam:''),'utf-8');
/* ]]> */</script>
<div id='bmone2t-5362.2.1.30' style='display:none'><script type='text/javascript'>/* <![CDATA[ */ bmone2n.makeAd('5362.2.1.30'); /* ]]> */</script></div>
<div id='bmone2t-5362.2.1.36' style='display:none'><script type='text/javascript'>/* <![CDATA[ */ bmone2n.makeAd('5362.2.1.36'); /* ]]> */</script></div>
<div id='bmone2t-5362.2.1.31' style='display:none'><script type='text/javascript'>/* <![CDATA[ */ bmone2n.makeAd('5362.2.1.31'); /* ]]> */</script></div>
<script type='text/javascript'>/* <![CDATA[ */ bmone2n.moveAd(); /* ]]> */</script>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-4997701-1");
pageTracker._initData();
pageTracker._trackPageview();
</script>
<script type="text/javascript" src="dll_kody.js.php?new7a&flash=1"></script><div id="okno"><div class="tlo"></div><div class="content"><div align="right" class="head"><div class="title">BBcode & Kody</div><a onclick="hide('okno'); return false;" href="#" style="color: #f2f2f2;">zamknij <img src="images/zamknij2.gif" border="0" style="vertical-align: top;" width="13" height="13" alt=""></a></div><form action="#" style="margin: 0px; padding: 0px;"><div style="padding: 10px;"><b style="font-size: 18px;">Kody i dodatki html</b><br><br>Wybierz kod, kt�ry najabardziej Ci odpowiada, mo�esz zamie�ci� go na swoim blogu stronie lub forum.<br><br><b>Link html do zamieszczenia na stronie, blogu:</b><br><textarea name="url_html" class="area" onclick="this.focus();this.select();" cols="80" rows="1" style="border: 2px solid #C0C0C0;"><a href="http://hostuje.net/file.php?id=c4f50ccefcd1c83e2a48374db4644a0a" title="Stupid Fresh- Spam Javelin Original.mp3">Stupid Fresh- Spam Javelin Original.mp3</a></textarea><br><b>Link BBcode do zamieszczenia na forum:</b><br><textarea name="url_html" class="area" onclick="this.focus();this.select();" cols="80" rows="1" style="border: 2px solid #C0C0C0;">[url=http://hostuje.net/file.php?id=c4f50ccefcd1c83e2a48374db4644a0a]Stupid Fresh- Spam Javelin Original.mp3[/url]</textarea></div></form></div></div><div id="okno2"><div class="tlo"></div><div class="content"><div align="right" class="head"><div class="title">Dodaj komentarz</div><a onclick="hide('okno2'); return false;" href="#" style="color: #f2f2f2;">zamknij <img src="images/zamknij2.gif" border="0" style="vertical-align: top;" width="13" height="13" alt=""></a></div><div style="padding: 10px;"><b style="font-size: 18px;">Zaloguj si� aby m�c dodawa� komentarze...</b><br><br>Nie masz jeszcze konta w Hostuje.net? <a href="rejestracja.php" style="text-decoration:underline">Za��� je tutaj za darmo</a>.<br><br></div></div></div>
</center>
<!-- cXense script begin -->
<div id="cX-root" style="display:none"></div>
<script type="text/javascript">
����var cX = cX || {}; cX.callQueue = cX.callQueue || [];
����cX.callQueue.push(['setAccountId', '9222291968715707916']);
����cX.callQueue.push(['setSiteId', '9222291968715707918']);
����cX.callQueue.push(['sendPageViewEvent']);
</script>
<script type="text/javascript">
����(function() { try { var scriptEl = document.createElement('script'); scriptEl.type = 'text/javascript'; scriptEl.async = 'async';
����scriptEl.src = ('https:' == document.location.protocol) ? 'https://scdn.cxense.com/cx.js' : 'http://cdn.cxense.com/cx.js';
����var targetEl = document.getElementsByTagName('script')[0]; targetEl.parentNode.insertBefore(scriptEl, targetEl); } catch (e) {};} ());
</script>
<!-- cXense script end -->
</body>
</html> <!-- 0.0020761489868164 -->
另一个更新(r.request.headers 和 r.request.body)
{'Content-Length': u'164', 'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/1.2.0 CPython/2.7.3 Linux/3.2.0-39-generic', 'Cookie': 'PHPSESSID=teir11avhfk543aapcmmgrn2a5', 'Content-Type': 'application/x-www-form-urlencoded'}
OK=1&name=Stupid+Fresh-+Spam+Javelin+Original.mp3&k=n4.hostuje.org&id=c4f50ccefcd1c83e2a48374db4644a0a&mime=audio%2Fmpeg&hasz=c4f50ccefcd1c83e2a48374db4644a0a®=1