2

我有一个http://www.statistics.com/index.php?page=glossary&term_id=703

具体在这些部分:

<b>Additive Error:</b>
<p> Additive error is the error that is added to the true value and does not 
depend on the true value itself. In other words, the result of the measurement is 
considered as a sum of the true value and the additive error:   </p> 

我尽我所能获取标签<p>和之间的文本</p>,用这个:

include('simple_html_dom.php');
$url = 'http://www.statistics.com/index.php?page=glossary&term_id=703';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);

foreach ( $html->find('b') as $e ) {
echo $e->innertext . '<br>';
}

它给了我:

Additive Error:
Browse Other Glossary Entries

我试图将 foreach 更改为:foreach ( $html->find('b p') as $e ) {

然后foreach ( $html->find('/b p') as $e ) {

然后它一直只给我空白页。我做错了什么?谢谢。

4

3 回答 3

1

为什么不使用 PHP 的内置 DOM 扩展和 xpath?

libxml_use_internal_errors(true);  // <- you might needs this if that page has errors
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');
//                             ^
//  this will get you text content from <p> tags preceded by <b> tags

如果 's 前面有多个<p>标签<b>,并且您只想获取第一个标签,请将 xpath 查询调整为:

string((//p[preceding::b]/text())[1])

要将它们全部作为一个DOMNodeList,省略string()函数://p[preceding::b]/text()然后您可以遍历列表并访问textContent每个节点的属性...

于 2013-06-18T18:04:28.537 回答
0

If you want all content which is inside b or p tags, you can simply do foreach ($html->find('b,p') as $e) { ... }.

于 2013-06-18T17:56:27.263 回答
0

尝试这个

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.statistics.com/index.php?page=glossary&term_id=703');
$xpath = new DOMXPath($dom);

$mytext = '';
foreach($xpath->query('//font') as $font){
    $mytext =  $xpath->query('.//p', $font)->item(0)->nodeValue;
    break;
}

echo $mytext;
?>
于 2013-06-18T18:10:38.330 回答