1

我正在使用 file_get_contents 使用以下代码获取远程页面的 html 源:

<?php
    //Get the url
    $url = "remotesite/static/section35.html";
    $html = file_get_contents($url);
    $doc = new DOMDocument(); // create DOMDocument
    libxml_use_internal_errors(true);
    $doc->loadHTML($html); // load HTML you can add $html

    $elements = $doc->getElementsByTagName('tbody');

    $toRemove = array();

    // gather a list of tbodys to remove
    foreach($elements as $el)
      if((strpos($el->nodeValue, 'desktop') !== false) && !in_array($el->parentNode, $toRemove, true))
        $toRemove[] = $el->parentNode;    

            foreach($elements as $el)
      if((strpos($el->nodeValue, 'Recommended') !== false) && !in_array($el->parentNode, $toRemove, true))
        $toRemove[] = $el->parentNode;  

    // remove them
    foreach($toRemove as $tbody)
      $tbody->parentNode->removeChild($tbody);

    echo $doc->saveHTML(); // save new HTML
?>

我现在要做的是在将其回显到我的页面之前从源中删除每个h3结束标记</h3>,因为这是内容正确显示的唯一方式

4

1 回答 1

0
echo str_replace('</h3>','',$doc->saveHTML());
于 2013-05-13T10:05:21.837 回答