php - 抓取数据时禁止显示

Question

我想从网站上获取数据。但它显示我被禁止。看看这段代码：

<?php
$link='http://www.sitedossier.com/site/wikipedia.org';
$so=file_get_contents($link);
echo ($link);
echo "</br>";
echo ($so);
?>

但它显示禁止。脚本结果是：

http://www.sitedossier.com/site/wikipedia.org
Forbidden.

但如果我只给出主站点名称：http://www.sitedossier.com用于获取数据。它在抢。

这里有什么问题？我的脚本错误或网站禁止使用任何脚本？如果是这样，那么我该如何绕过它？

谢谢

score 1 · Accepted Answer

有些网站不喜欢机器人。如果你有它，你可以使用 cURL 来解决这个问题：

<?php
$ch = curl_init('http://www.sitedossier.com/site/wikipedia.org');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20121221 Firefox/20.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.sitedossier.com');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');

$data = curl_exec($ch);
echo $data;
?>

编辑：它现在有效。我猜他们设置了网站需要的 cookie。

score 0 · Accepted Answer

该网站需要一个file_get_contents不发送的用户代理字符串。

使用fsockopen和相关功能确保发送正确的标头。

php - 抓取数据时禁止显示

2 回答 2

Related

Reference