0

我尝试练习 CURL,但效果不佳 请告诉我这里有什么问题是我的代码

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://xxxxxxx.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_USERAGENT, "Google Bot");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$downloaded_page = curl_exec($ch);
curl_close($ch);
preg_match_all('/<div\s* class =\"abc\">(.*)<\/div>/', $downloaded_page, $title); 
echo "<pre>";
print($title[1]);  
echo "</pre>";

警告是Notice: Array to string conversion

我要解析的html是这样的

<div class="abc">
<ul> blablabla </ul>
<ul> blablabla </ul>
<ul> blablabla </ul>
</div>
4

2 回答 2

2

preg_match_all 返回一个数组数组。

如果您的代码是:

preg_match_all('/<div\s+class="abc">(.*)<\/div>/', $downloaded_page, $title); 

您实际上想要执行以下操作:

echo "<pre>";
foreach ($title[1] as $realtitle) {
    echo $realtitle . "\n";
}
echo "</pre>";

因为它将搜索所有具有“abc”类的div。我还建议您强化您的正则表达式,使其更加健壮。

preg_match_all('/<div[^>]+class="abc"[^>]*>(.*)<\/div>/', $downloaded_page, $title);

这将匹配以及

顺便说一句:DomDocument 非常慢,我发现正则表达式有时(取决于文档的大小)可以提高 40 倍的速度。保持简单。

最好的,尼古拉斯

于 2013-10-13T21:30:36.693 回答
1

不要使用正则表达式解析 HTML。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.lipsum.com/');
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# foreach ($xpath->query('//div') as $div) { // all div's in html
foreach ($xpath->query('//div[contains(@class, "abc")]') as $div) { // all div's that have "abc" classname
    // $div->nodeValue contains fetched DIV content
}
于 2013-10-13T10:04:18.053 回答