8

我在这个网站上阅读了 20 多个相关问题,在谷歌中搜索但没有用。我是 PHP 新手,正在使用 PHP Simple HTML DOM Parser 来获取 URL。虽然此脚本适用于本地测试页面,但它不适用于我需要该脚本的 URL。

这是我为此编写的代码,遵循 PHP Simple DOM 解析器库附带的示例文件:

<?php

include('simple_html_dom.php');

$html = file_get_html('http://www.farmersagent.com/Results.aspx?isa=1&name=A&csz=AL');

foreach($html->find('li.name ul#generalListing') as $e)
echo $e->plaintext;  

?>

这是我收到的错误消息:

Warning: file_get_contents(http://www.farmersagent.com/Results.aspx?isa=1&amp;name=A&amp;csz=AL) [function.file-get-contents]: failed to open stream: Redirection limit reached, aborting in /home/content/html/website.in/test/simple_html_dom.php on line 70

请指导我应该做些什么来使它工作。我是新手,所以请提出一种简单的方法。在阅读此站点上的其他问题及其答案时,我尝试了 cURL 方法来创建句柄,但未能成功。我尝试的 cURL 方法不断返回“资源”或“对象”。我不知道如何将它传递给 Simple HTML DOM Parser 以使 $html->find() 正常工作。

请帮忙!谢谢!

4

5 回答 5

11

今天遇到了类似的问题。我正在使用 CURL,它没有返回我的任何错误。用 file_get_contents() 测试,我得到了......

无法打开流:已达到重定向限制,正在中止

进行了一些搜索,我结束了这个适用于我的案例的功能......

function getPage ($url) {


$useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36';
$timeout= 120;
$dir            = dirname(__FILE__);
$cookie_file    = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_ENCODING, "" );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt($ch, CURLOPT_AUTOREFERER, true );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com/');
$content = curl_exec($ch);
if(curl_errno($ch))
{
    echo 'error:' . curl_error($ch);
}
else
{
    return $content;        
}
    curl_close($ch);

}

该网站正在检查有效的用户代理和cookie

cookie问题导致了它!:) 和平!

于 2015-07-29T15:00:59.580 回答
4

解决方法:

<?php
$context = stream_context_create(
    array(
        'http' => array(
            'max_redirects' => 101
        )
    )
);
$content = file_get_contents('http://example.org/', false, $context);
?>

中间是否有代理也可以告知:

$aContext = array('http'=>array('proxy'=>$proxy,'request_fulluri'=>true));
$cxContext = stream_context_create($aContext);

更多详细信息:https : //cweiske.de/tagebuch/php-redirection-limit-reached.htm(感谢@jqpATs2w)

于 2016-08-29T21:54:54.327 回答
1

使用 cURL,您需要将 CURLOPT_RETURNTRANSFER 选项设置为 true,以便通过调用返回请求正文,curl_exec如下所示:

$url = 'http://www.farmersagent.com/Results.aspx?isa=1&name=A&csz=AL';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// you may set this options if you need to follow redirects. Though I didn't get any in your case
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($curl);
curl_close($curl);

$html = str_get_html($content);
于 2012-08-28T17:13:53.543 回答
1

我还需要添加这个 HTTP 上下文选项ignore_errors

见:https ://www.php.net/manual/en/context.http.php

$arrContextOptions = array(
    "ssl" => array(
        // skip error "Failed to enable crypto" + "SSL operation failed with code 1."
        "verify_peer" => false,
        "verify_peer_name" => false,
         ),
     // skyp error "failed to open stream: operation failed" + "Redirection limit reached"
     'http' => array(
          'max_redirects' => 101,
          'ignore_errors' => '1'
      ),
           
  );

  $file = file_get_contents($file_url, false, stream_context_create($arrContextOptions));

显然,我仅将它用于在本地环境中进行快速调试。它不用于生产

于 2021-01-13T12:47:41.600 回答
0

我不确定您为什么使用 get html 中的字符串重新定义 $html 对象,该对象旨在用于搜索字符串。如果用字符串覆盖对象,则该对象不再存在且无法使用。

无论如何,要搜索从 curl 返回的字符串。

<?php
$url = 'http://www.example.com/Results.aspx?isa=1&name=A&csz=AL';

include('simple_html_dom.php');

# create object
$html = new simple_html_dom();

#### CURL BLOCK ####

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
# you may set this options if you need to follow redirects.
# Though I didn't get any in your case
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);

$content = curl_exec($curl);
curl_close($curl);

# note the variable change.
$string = str_get_html($content);

# load the curl string into the object.
$html->load($string);

#### END CURL BLOCK ####

# without the curl block above you would just use this.
$html->load_file($url);

# choose the tag to find, you're not looking for attributes here.
$html->find('a');

# this is looking for anchor tags in the given string.
# you output the attributes contents using the name of the attribute.
echo $html->href;
?>

您可能正在搜索不同的标签,方法是相同的

# just outputting a different tag attribute
echo $html->class;

echo $html->id;
于 2013-04-01T14:46:42.627 回答