1

我需要从 HTML 文档中删除一些值和一些原始 HTML。我想过使用 XPath,但我无法让我的查询工作。

这是我想要实现的目标:

<div class="unit-id">
   <div class="title">
      some title-1
   </div>

   <div class="another-class">
      another class
   </div>
   <p>segwegw1<p>
   <p>segwegw1<p>
   <p>segwegw1<p>
   <p>segwegw1<p>
   <ul>
     <li>jfjfj</li>
     <li>jfjfj</li>
     <li>jfjfj</li>
   </ul>
</div>


<div class="unit-id">
   <div class="title">
      some title-2
   </div>
   <div class="another-class">
      some other class
   </div>
   <p>segwegw2<p>
   <p>segwegw2<p>
   <p>segwegw2<p>
   <p>segwegw2<p>
</div>


<div class="unit-id">
   <div class="title">
      some title-3
   </div>
   <div class="some-other-class">
      some other data
   </div>
   <p>segwegw3<p>
   <p>segwegw3<p>
   <p>segwegw3<p>
   <p>segwegw3<p>
</div>

因此,我希望查询div使用一个 unit-id 类遍历每个,并返回divs一个类的值title和 HTML 的其余部分,不包括任何更多divs,所以只是分类的特定 unit-id 的p标签和ul东西div,然后是下一次迭代。

那可能吗?你能给我提供一个如何编写这个查询的例子吗?有更好的方法吗?

4

1 回答 1

3

这段代码的作用类似于您正在寻找的东西:

function get_content($data){
    $doc = new DOMDocument();
    //load HTML string into document object
    if ( ! @$doc->loadHTML($data)){
        return FALSE;
    }
    //create XPath object using the document object as the parameter
    $xpath = new DOMXPath($doc);
    $query = "//div[@class='unit-id']";
    //XPath queries return a NodeList
    $res = $xpath->query($query);
    $out = array();
    foreach ($res as $key => $node){
        //subquery
        $sub = $xpath->query('.//div[@class="title"]', $node);
        $out[$key]['title'] = trim($sub->item(0)->nodeValue);
        foreach ($node->getElementsByTagName('p') as $key2 => $value){
            $out[$key]['par'][$key2] = $value->nodeValue;
        }
        foreach ($node->getElementsByTagName('li') as $key2 => $value){
            $out[$key]['list'][$key2] = $value->nodeValue;
        }
    }
    return $out;
}

请注意,您的 HTML 中有错误。您正在关闭段落标签应该有尾部斜杠</p>

这是输出:

array
  0 => 
    array
      'title' => string 'some title-1' (length=12)
      'par' => 
        array
          0 => string 'segwegw1' (length=8)
          1 => string 'segwegw1' (length=8)
          2 => string 'segwegw1' (length=8)
          3 => string 'segwegw1' (length=8)
      'list' => 
        array
          0 => string 'jfjfj' (length=5)
          1 => string 'jfjfj' (length=5)
          2 => string 'jfjfj' (length=5)
  1 => 
    array
      'title' => string 'some title-2' (length=12)
      'par' => 
        array
          0 => string 'segwegw2' (length=8)
          1 => string 'segwegw2' (length=8)
          2 => string 'segwegw2' (length=8)
          3 => string 'segwegw2' (length=8)
于 2013-05-13T20:24:54.907 回答