0

使用 cURL 抓取一个安全(即登录)页面,我束手无策。我成功地抓取了两个站点,几乎没有问题,现在我无法登录这个站点。cURL 获取了我要求它访问的所有页面,但它们都没有登录,这无济于事。所以也许有人会发现我错过的错误?

代码是:

$url_to = 'http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/';
$url_from = 'http://fastorder.newrock.es/store2009/index.php/customer/account/login/';
$url_get = 'http://fastorder.newrock.es/store2009/index.php/';
$name_pass = 'login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=';

function login($link,$user,$from) {
    $fp = fopen("cookie.txt", "w");
    fclose($fp);
    $log = curl_init();
    curl_setopt($log, CURLOPT_REFERER, $from);
    curl_setopt($log, CURLOPT_URL, $link);
    curl_setopt($log, CURLOPT_COOKIEJAR, "cookie.txt");
    curl_setopt($log, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($log, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6");
    curl_setopt($log, CURLOPT_TIMEOUT, 40);
    curl_setopt($log, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($log, CURLOPT_HEADER, TRUE);
    curl_setopt($log, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($log, CURLOPT_POST, TRUE);      
    curl_setopt($log, CURLOPT_POSTFIELDS, $user);
    $data = curl_exec($log);
    curl_close($log);
}

login($url_to,$name_pass,$url_from);

function get($url) {
    $get = curl_init();
    curl_setopt($get, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($get, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($get, CURLOPT_URL, $url);
    return curl_exec ($get);
    curl_close ($get);
}

$html = get($url_get);
echo $html;

这是在其他两个站点上工作的(或多或少)相同的脚本,并且可以正常登录。一开始让我失望的是$name_pass. 原来该站点已将名称和密码输入字段命名为login[username]login[password]。到底为什么,我不知道,但我试过用代码和括号发送它,但没有任何帮助。

Live HTTP Headers 为我提供了以下页面:

http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/

POST /store2009/index.php/customer/account/loginPost/ HTTP/1.1
Host: fastorder.newrock.es
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://fastorder.newrock.es/store2009/index.php/customer/account/login/
Cookie: frontend=6tjul97q4mvn0046ier0k79li8
Content-Type: application/x-www-form-urlencoded
Content-Length: 81
login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=
HTTP/1.1 302 Found
Date: Fri, 26 Feb 2010 12:29:19 GMT
Server: Apache/2.0.63 (CentOS)
X-Powered-By: PHP/5.2.10
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: http://fastorder.newrock.es/store2009/index.php/customer/account/
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

我试图将所有可能的内容复制到 cURL 脚本,认为有一些模糊的方法可以阻止脚本登录。但现在我完全被卡住了,我不知道下一步该做什么。而且我翻阅了很多教程,它们都提供了对前两个站点很有魅力的建议。

哈普?

4

3 回答 3

0

建议:使用 Fiddler (www.fiddler2.com) 来区分请求流量、CURL 与您的浏览器。

于 2010-02-26T15:18:24.127 回答
0

可能是这样的:

login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=

我不是 curl 大师,但你的脚本似乎没问题,所以也许你不应该逃避这些角色。

我会用 curl 和这种登录表单进行本地测试。也许你可以从那里调试出什么问题。如果我是对的,将会有空白字段。

于 2010-02-26T15:09:39.673 回答
0

该商店的注册/登录有问题。激活电子邮件说只需登录即可激活帐户。我已尝试登录多次,但收到错误“此帐户未激活”。每次我尝试登录。

下面是打印返回的登录页面的快速更改。

$url_to = 'http://fastorder.newrock.es/store2009/index.php/customer/account/loginPost/';
$url_from = 'http://fastorder.newrock.es/store2009/index.php/customer/account/login/';
$url_get = 'http://fastorder.newrock.es/store2009/index.php/';
$name_pass = 'login%5Busername%5D=*****&login%5Bpassword%5D=*****&send=';

function login($link,$user,$from) {
$fp = fopen("cookie.txt", "w");
fclose($fp);
$log = curl_init();
curl_setopt($log, CURLOPT_REFERER, $from);
curl_setopt($log, CURLOPT_URL, $link);
curl_setopt($log, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($log, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($log, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6");
curl_setopt($log, CURLOPT_TIMEOUT, 40);
curl_setopt($log, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($log, CURLOPT_HEADER, TRUE);
curl_setopt($log, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($log, CURLOPT_POST, TRUE);      
curl_setopt($log, CURLOPT_POSTFIELDS, $user);
$data = curl_exec($log);
curl_close($log);
return $data;
}

echo login($url_to,$name_pass,$url_from);

function get($url) {
$get = curl_init();
curl_setopt($get, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($get, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($get, CURLOPT_URL, $url);
return curl_exec ($get);
curl_close ($get);
}

$html = get($url_get);
echo $html;

编辑:
cookie 数据是否正在写入 cookie 文件 (cookie.txt)?如果不...

  1. 检查文件权限,确保其可写。

  2. php5 早期版本中的一个错误导致 cookie 文件选项被忽略。

该错误的详细信息在这里:http
://bugs.php.net/bug.php?id=33475 解决方法:在 curl_close($log); 之后添加 unset($log);

如果无法对其进行测试,则很难调试此脚本。

于 2010-02-26T18:27:41.473 回答