1

我正在制作一个网络爬虫,它可以从网站上提取数据并且几乎是在家里,但我遇到了一个问题。我可以很好地登录/检索 cookie,但是以这种方式登录时网站反应异常。(请参阅屏幕截图)

一旦页面初始加载,cookie 似乎就变得无用了(cookie 仍然存在,我已经检查过)。有人知道我在做什么错吗?我试过环顾类似的问题,但无济于事。

代码:

<?
session_write_close();
$ch = curl_init();
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt ($ch, CURLOPT_COOKIEJAR, getcwd().'/cookie.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, getcwd().'/cookie.txt');
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_REFERER, "http://www.callofduty.com/"); 
curl_setopt($ch, CURLOPT_URL,"https://profile.callofduty.com/elite/login");
curl_exec($ch);

curl_setopt ($ch, CURLOPT_REFERER, "https://profile.callofduty.com/elite/login"); 
curl_setopt($ch, CURLOPT_URL, 'https://profile.callofduty.com/elite/do_login');
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'j_username=USERNAME&j_password=PASSWORD');
echo "Initial Dump: <p>";
echo curl_exec($ch);

curl_setopt($ch, CURLOPT_URL,"https://elite.callofduty.com/career/xbox/54d10030cc86b1b9c3162b395d46bffe#/playercardmw3");
curl_exec($ch);
echo "<hr/>Second Dump: <p>";
var_dump(curl_getinfo($ch));
?>

注意:如果我在另一个后台窗口中定期登录,该页面的效果会稍微好一些。有更多信息可以加载。这让我很困惑,因为服务器上的 PHP 脚本不是处理 cookie 吗?o_o

--- 更新 --- 好的,我不知道为什么,因为我没有更改任何内容,但现在当我加载网站时它看起来很正常,只是没有我想要的信息(http://gyazo.com/e326f2f4cdac3e6a4a20fdc9afc62f2d. png?1340088915)。但是,它显示我已注销。(注意:您在退出时无法查看个人资料,它会强制您进入登录屏幕)

这是请求的回声和 var_dump 的打印输出:http ://gyazo.com/ded134560cdf6c6ecf0b27221f35e32b.png?1340110136

据我所知,即使我得到了 cookie,该站点仍然认为我已注销。

提前致谢!

4

1 回答 1

2

当您跳转到网站的其他部分时,您需要指定 cookie 执行类似的操作。

function login(){
   $ch = curl_init();
   curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);//add this line
   curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);//add this line
   curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;  rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
   curl_setopt ($ch, CURLOPT_COOKIEJAR, getcwd().'/cookie.txt');
   curl_setopt ($ch, CURLOPT_COOKIEFILE, getcwd().'/cookie.txt');
   curl_setopt($ch, CURLINFO_HEADER_OUT, true);
   curl_setopt($ch, CURLOPT_HEADER, 1);
   curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt ($ch, CURLOPT_REFERER, "http://www.callofduty.com/"); 
   curl_setopt($ch, CURLOPT_URL,"https://profile.callofduty.com/elite/login");
   curl_exec($ch);

   curl_setopt ($ch, CURLOPT_REFERER, "https://profile.callofduty.com/elite/login"); 
   curl_setopt($ch, CURLOPT_URL, 'https://profile.callofduty.com/elite/do_login');
   curl_setopt ($ch, CURLOPT_POST, 1);
   curl_setopt ($ch, CURLOPT_POSTFIELDS, 'j_username=USERNAME&j_password=PASSWORD');
   echo "Initial Dump: <p>";
   echo curl_exec($ch);
}

function getPlayer(){
   login();

   $ch = curl_init();
   curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
   curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
   curl_setopt($ch,  CURLOPT_URL,"https://elite.callofduty.com/career/xbox/54d10030cc86b1b9c3162b395d46bffe#/playercardmw3");
   curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");//add this line
   curl_exec($ch);
   echo "<hr/>Second Dump: <p>";
   var_dump(curl_getinfo($ch));
}

我没有对此进行测试,因为您的代码一团糟,但请尝试一下。

于 2012-07-21T11:33:49.350 回答