php - 有没有办法优化在页面上查找文本项（不是正则表达式）

Question

在看到几个线程破坏了在 HTML 文档中查找要匹配的术语的正则表达式方法后，我使用了简单的 HTML DOM PHP 解析器 ( http://simplehtmldom.sourceforge.net/ ) 来获取我的文本位之后，但我想知道我的代码是否最佳。感觉循环了太多次。有没有办法优化以下循环？

//Get the HTML and look at the text nodes
   $html = str_get_html($buffer);
   //First we match the <body> tag as we don't want to change the <head> items
   foreach($html->find('body') as $body) {
    //Then we get the text nodes, rather than any HTML
    foreach($body->find('text') as $text) {
     //Then we match each term
     foreach ($terms as $term) {
      //Match to the terms within the text nodes
      $text->outertext = str_replace($term, '<span class="highlight">'.$term.'</span>', $text->outertext);
     }       
    }
   }

例如，在开始循环之前确定是否有任何匹配项是否会有所不同？

score 0 · Accepted Answer

您不需要外部 foreach 循环；格式良好的文档中通常只有一个正文标签。相反，只需使用$body = $html->find('body',0);.

但是，由于只有一次迭代的循环在运行时基本上等同于根本不循环，因此无论哪种方式都可能不会对性能产生太大影响。所以实际上，即使在您的原始代码中，您实际上也只有 2 个嵌套循环，而不是 3 个。

score 0 · Accepted Answer

出于无知，是否find采用任意 XPath 表达式？如果是这样，您可以将两个外部循环合二为一：

foreach($html->find('body/text') as $body) {
    ...
}

php - 有没有办法优化在页面上查找文本项（不是正则表达式）

2 回答 2

Related

Reference