php - SIMPLE HTML DOM - 如何忽略嵌套元素？

Question

我的html代码如下

<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>

我想要做的是提取“我想要这个文本”，留下所有其他元素。我已经尝试了以下几次迭代，但没有一个返回我需要的文本：

$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);

一些指导将不胜感激，因为关于过滤器的 simple_html_dom 部分非常简单。

score 1 · Accepted Answer

怎么样使用 php preg_match ( http://php.net/manual/en/function.preg-match.php )

试试下面的：

<?php

$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;

$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);

echo $matches[1];

?>

正则表达式解释：找到 text class="phone"然后继续直到行尾，使用 * 匹配任何字符。. 然后使用\n切换到新行，并通过括住 * 来获取该行上的所有内容。放入括号中。

返回的结果存储在数组 $matches 中。$matches[0] 保存从整个正则表达式返回的值，而 $matches[1] 保存由右括号返回的值。

php - SIMPLE HTML DOM - 如何忽略嵌套元素？

1 回答 1

Related

Reference