0

我正在尝试编写一个脚本,可以从我学校的日程搜索网页中检索 HTML。当我使用浏览器访问该网页时,我可以正常访问该网页,但是当我尝试使用 cURL 使其工作时,它会从重定向页面获取 HTML。当我改变

CURLOPT_FOLLOWLOCATION

从真到假的变量,它只输出一个带有标题的空白页。

作为参考,我的 PHP 代码是

<?php
$curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/');

curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");

$result = curl_exec($curl_connection);

print $result;

?>

我试图从 cURL 获取 HTML 的网站是https://www.registrar.usf.edu/ssearch/https://www.registrar.usf.edu/ssearch/search.php

有任何想法吗?

4

1 回答 1

3

我又添加了 2 行,它现在保存了 cookie,这些 cookie 决定在您尝试抓取 shedule 的页面时是否重定向您。

$curl_connection = curl_init();
$url = "https://www.registrar.usf.edu/ssearch/search.php";
curl_setopt($curl_connection, CURLOPT_URL, $url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos.
curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_connection, CURLOPT_HEADER, true);
curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/");
$result = curl_exec($curl_connection);
echo $result;

另外,我还没有看到有人输入网址curl_init

这是饼干:

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

www.registrar.usf.edu   FALSE   /   FALSE   0   PHPSESSID   eied78t0v1qlqcop0rdk214361
www.registrar.usf.edu   FALSE   /ssearch/   FALSE   1336718465  cookie_test cookie_set

如果您想调试不工作的 curl 东西,请从开始,var_dump(curl_getinfo($curl_connection));下一个要检查的是 curl_error($curl_connection);

于 2012-05-09T06:41:19.193 回答