0

我正在尝试使用 curl 登录我的一个网站以从页面中提取信息。它似乎不起作用。这是我正在尝试的代码。如果它有帮助,我可以为这种情况创建一个用户/通行证。

<?php

$username = 'xxx';
$password = 'xxx';
$loginUrl = 'http://gwintersdev.com/user';
$finalUrl = 'http://gwintersdev.com/admin';

$userinput = 'name';
$passwordinput = 'pass';

$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_URL,$loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "$userinput=$username&$passwordinput=$password");
curl_setopt($ch, CURLOPT_USERAGENT, 'user-agent');


ob_start();      // prevent any output
curl_exec ($ch); // execute the curl command
ob_end_clean();  // stop preventing output

curl_close ($ch);
unset($ch);

$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $finalUrl);

$buf2 = curl_exec ($ch);

curl_close ($ch);
print $buf2;
?>        

更新:我能够让上述工作,但我在不同的 ASP 站点上尝试这个并且它不工作。我抓住了所有隐藏的字段并将它们添加到帖子字符串中,但它仍然无法登录。

<?php
$username = 'xxx';
$password = 'xxx';
$loginUrl = 'http://vitalstim.com/health_professionals/certified_provider_resources/forum.aspx';
$finalUrl = 'http://vitalstim.com/health_professionals/certified_provider_resources/forum.aspx';
$userinput = 'ctl00$ContentPlaceHolder1$uc_login$txtUser';
$passwordinput = 'ctl00$ContentPlaceHolder1$uc_login$txtPass';
$login = 'ctl00$ContentPlaceHolder1$uc_login$butLogin';

$validation_input = '__EVENTVALIDATION';
$validation_input_value = '/wEWAgKf+PTrBQKItpn5BDXHCHsANbEpwkEBmMyNv+32L2Ec';
$view_state = '/wEPDwUJLTQyMjg0NzI0D2QWAmYPZBYGAgEPZBYEAgYPFgIeB1Zpc2libGVoZAIHDxYCHwBoZAIDD2QWBAIBD2QWCAIBD2QWBAIBDw8WAh4EVGV4dGVkZAIFDw8WAh8AaGRkAgcPZBYCAgEPZBYCAgMPZBYCAgEPFgIfAGhkAgkPDxYCHwBoZGQCCw8PFgIfAGhkZAIDDxYCHwBoZAIFDw8WAh8BBXY8c2NyaXB0IGxhbmd1YWdlPSJqYXZhc2NyaXB0IiB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiPgokKGRvY3VtZW50KS5yZWFkeShmdW5jdGlvbigpIHsKVml0YWxTdGltLkluaXQoNCk7Cn0pOwo8L3NjcmlwdD4KZGRkdz/7+FcQ1E1sbC0Gua3jJsCGSnM=';
$event_valid = '/wEWBwKeiM4xAoi2mfkEAurz/r4MAvTX0jYC+4GopQkCo6iimggC2pO41g77y84VwyhP6Ek+7PGZYDNgOawRZw==';

$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,   "$userinput=$username&$passwordinput=$password&$validation_input=$validation_input_value&$login=login&__EVENTVALIDATION=$event_valid&_VIEWSTATE=$view_state");
curl_setopt($ch, CURLOPT_USERAGENT, 'user-agent');
curl_exec ($ch); // execute the curl command

curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $finalUrl);

$buf2 = curl_exec ($ch);
curl_close ($ch);
print $buf2;
?>
4

2 回答 2

1

看起来您缺少该表单源中的两个隐藏字段。CSRF保护有点。您可以尝试通过在其他两个请求之前执行第三个请求并获取这些值并将它们放入第二个请求中来抓取它。

另外,就像我在上面的评论中所说,不要关闭你的 curl 处理程序。

您可以提供的任何更多信息都会很棒

编辑:

至于ASP页面:asp用它来卷曲是非常困难的。它很容易隐藏您需要的字段。我的建议是创建一个假页面,即and ,并使用 chrome 或 firebug 在其页面上更改表单的操作,以提交到您的页面print_r。只是为了检查您是否缺少任何东西$_POST$_GET

我做了我建议尝试的事情,我得到了这个:

Array
(
    [__EVENTTARGET] => 
    [__EVENTARGUMENT] => 
    [__VIEWSTATE] => /wEPDwUJLTQyMjg0NzI0D2QWAmYPZBYGAgEPZBYEAgYPFgIeB1Zpc2libGVoZAIHDxYCHwBoZAIDD2QWBAIBD2QWCAIBD2QWBAIBDw8WAh4EVGV4dGVkZAIFDw8WAh8AaGRkAgcPZBYCAgEPZBYCAgMPZBYCAgEPFgIfAGhkAgkPDxYCHwBoZGQCCw8PFgIfAGhkZAIDDxYCHwBoZAIFDw8WAh8BBXY8c2NyaXB0IGxhbmd1YWdlPSJqYXZhc2NyaXB0IiB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiPgokKGRvY3VtZW50KS5yZWFkeShmdW5jdGlvbigpIHsKVml0YWxTdGltLkluaXQoNCk7Cn0pOwo8L3NjcmlwdD4KZGRkdz/7+FcQ1E1sbC0Gua3jJsCGSnM=
    [ctl00$ContentPlaceHolder1$uc_login$txtUser] => test
    [ctl00$ContentPlaceHolder1$uc_login$txtPass] => test
    [ctl00$ContentPlaceHolder1$uc_login$butLogin] => Login
    [__EVENTVALIDATION] => /wEWBwKeiM4xAoi2mfkEAurz/r4MAvTX0jYC+4GopQkCo6iimggC2pO41g77y84VwyhP6Ek+7PGZYDNgOawRZw==
)
于 2013-05-13T23:48:29.840 回答
0

This ended up doing the trick. http://www.mishainthecloud.com/2009/12/screen-scraping-aspnet-application-in.html?showComment=1368565341638#c9104469935977149435

于 2013-05-14T21:02:53.127 回答