我需要检索页面的 html 内容(源),例如:www.google.com 页面。然后我可以使用:file_get_contents
或curl_init
使用 PHP。
就像有人在此之前提出的问题一样:
如何在 PHP 中获取网页的 HTML 代码?
但对我来说更重要的是,有些页面是Access Required。
但我已授予访问权限并知道密码。
(假设它用表格询问密码,密码是“abcd”。)
那么我如何使用 PHP 以编程方式读取这些页面呢?
更新(对我来说是答案):
我在curl-setopt
下面找到了 Bekzat Abdiraimov 建议的解决方案。然后现在我在这里详细发布了我在某处找到并修改的代码:
<?php
function curl_grab_page($url, $ref_url, $data, $login, $proxy, $proxystatus){
if($login == 'true') {
$fp = fopen("cookie.txt", "w");
fclose($fp);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'true') {
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref_url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_exec($ch);
curl_setopt($ch,CURLOPT_URL,$ref_url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
ob_start();
$data = curl_exec($ch);
ob_end_clean();
curl_close($ch);
return $data;
}
/*
* $auth_processing_url .. is the posted 'action' url in login form like <form method=post action='http://www.abc.com/login.asp'> So it should be like: "http://www.abc.com/login.asp"
* $url_to_go_after_login .. is the url you want to go (to be redireced) after login
* $login_post_values .. are the form input names what Login Form is asking. E.g on form: <input name="username" /><input name="password" />. So it should be: "username=4lvin&password=mypasswd"
*/
echo curl_grab_page($auth_processing_url, $url_to_go_after_login, $login_post_values, "true", "null", "false");
?>