2

我正在尝试使用 php curl 登录 ets.org/toefl 帐户。但我无法登录网站。我通常会收到一条错误消息,说服务器正忙,但是当我使用浏览器登录时它可以工作。我附上了我的代码。任何人都可以看到有什么问题吗?

<?php
include('simple_html_dom.php');

$login_url = 'https://toefl-registration.ets.org/TOEFLWeb/logon.do';

$username='****';
$password='***';
$ck = 'cookie.txt';

$agent = 'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0';
// extra headers
$headers[] = "Connection: keep-alive";
//$headers[]= "Accept-Encoding: gzip, deflate";


$ch = curl_init();

curl_setopt($ch, CURLOPT_HEADER,  0);
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);         
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 

curl_setopt($ch, CURLOPT_COOKIEJAR, $ck);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ck);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
//curl_setopt($ch, CURLOPT_URL, 'https://toefl-registration.ets.org/TOEFLWebextISERLogonPrompt.do');

$output = curl_exec($ch);
//echo $output;

$html = new simple_html_dom();
$html = str_get_html($output);
$e = $html->find(".loginform");
$a = $e[0]->find('input');
$str = $a[0]->outertext;
preg_match("/value=\"(.*)\"/",$str,$match);
$h_attr = $match[1];

$fields['org.apache.struts.taglib.html.TOKEN'] = $h_attr;
$fields['currentLocale']= 'en_US';
$fields['username'] = $username;
$fields['password'] = $password;
$fields['x'] = 11;
$fields['y'] = 4;
//print_r($fields);
//echo "\r\n";
$POSTFIELDS = http_build_query($fields); 
//echo $POSTFIELDS;

$headers[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$headers[] = "Accept-Language: en-US,en;q=0.5";
$headers[]="Referer: https://toefl-registration.ets.org/TOEFLWeb/extISERLogonPrompt.do";

curl_setopt($ch, CURLOPT_URL, $login_url); 
curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $POSTFIELDS); 
$result = curl_exec($ch);
print $result;

(评论更新)

浏览器发帖:

org.apache.struts.taglib.html.TOKEN=c1b88957e9914492fe8cc20b33ef1cdd¤tLoca‌​le=en_US&username=name&password=pass&x=23&y=3 由我。org.apache.struts.taglib.html.TOKEN=345a9f935b2db8a69f55c5b4d3372190¤tLoca‌​le=en_US&username=name&password=pass&x=11&y=4

由 php curl 生成的帖子详细:

POST /TOEFLWeb/logon.do HTTP/1.1 用户代理:Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0 主机:toefl-registration.ets.org Cookie:au=MTM3Mjc4ODQwMg%3d%3d ; 服务器=3;JSESSIONID=23C39022E2641B8F5AC944295837315E Con​​nection: keep-alive Accept: / Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Language: en-US,en;q=0.5 Referer :toefl-registration.ets.org/TOEFLWeb/extISERLogonPrompt.do 内容长度:134 内容类型:应用程序/x-www-form-urlencoded

4

2 回答 2

2

尝试将 CURL 脚本发送的 HTTP 标头与浏览器发送的标头进行比较(使用 chrome 开发工具)。可能由于缺少某些标头信息,远程服务器拒绝了您。

确保 cookie 文件具有完全权限。来自 php.net:

指定 CURLOPT_COOKIEFILE 或 CURLOPT_COOKIEJAR 选项时,不要忘记“chmod 777”必须创建 cookie 文件的目录。

于 2013-07-02T18:04:06.370 回答
0

我让它以某种方式工作......我在代码中添加了证书验证。此外,我发现在获取 cookie 和登录这两个函数之间需要存在一些延迟。工作代码如下

<?php
include('simple_html_dom.php');

$login_url = 'https://toefl-registration.ets.org/TOEFLWeb/logon.do';
$cookie_page = 'https://toefl-registration.ets.org/TOEFLWeb/extISERLogonPrompt.do';

$username='******';
$password='******';

//$ck = 'E:\Projects\Web Development\toefl_script\cookie.txt';
$ck = 'D:\Nikhil\Projects\Wamp\toeflscript\cookie.txt';

//$agent = 'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0';
$agent = 'Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20100101 Firefox/21.0';

$headers[] = "Connection: keep-alive";
$headers[] = "Accept: */*";


/* Begin Program Execution */

init_curl();
get_cookie();
sleep(30);
login();

function get_cookie()
{
    global $ch, $ck, $h_attr, $headers, $cookie_page;
    global $ck;

    curl_setopt($ch, CURLOPT_URL, $cookie_page);

    //curl_setopt($ch, CURLOPT_VERBOSE, true);
    $output = curl_exec($ch);
    //echo $output;

    /*
    $html = new simple_html_dom();
    $html = str_get_html($output);
    $e = $html->find(".loginform");
    $a = $e[0]->find('input');
    $str = $a[0]->outertext;
    preg_match("/value=\"(.*)\"/",$str,$match);
    $h_attr = $match[1];
    */
}

function init_curl()
{
    global $ch, $ck, $h_attr, $headers, $agent;
    global $ck;

    ini_set('max_execution_time', 300);

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER,  0);
    curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);

    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);

    curl_setopt($ch, CURLOPT_CAINFO, getcwd() . '/cacert.pem');

    curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 

    curl_setopt($ch, CURLOPT_COOKIEJAR, $ck);
    curl_setopt ($ch, CURLOPT_COOKIEFILE, $ck);
}

function login()
{
    global $ch, $login_url, $password, $username, $ck, $h_attr, $headers;

    //$fields['org.apache.struts.taglib.html.TOKEN'] = 'abc';//$h_attr;
    $fields['currentLocale']= 'en_US';
    $fields['username'] = $username;
    $fields['password'] = $password;
    $fields['x'] = 11;
    $fields['y'] = 4;

    $POSTFIELDS = http_build_query($fields); 
    //print_r($fields);
    //echo $POSTFIELDS;

    $headers[] = "Accept-Language: en-US,en;q=0.5";
    $headers[]="Referer: https://toefl-registration.ets.org/TOEFLWeb/extISERLogonPrompt.do";

    curl_setopt($ch, CURLOPT_URL, $login_url); 
    curl_setopt($ch, CURLOPT_HTTPHEADER,  $headers);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    curl_setopt($ch, CURLOPT_POST, 1); 
    curl_setopt($ch, CURLOPT_POSTFIELDS, $POSTFIELDS); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    $result = curl_exec($ch);
    print $result;
}
于 2013-07-03T08:22:15.657 回答