php - 使用 php simple dom parser 进行 php 屏幕抓取

Question

我正在使用简单的 html dom 解析器来抓取网站......我如何在循环中跳过特定的类

score 1 · Accepted Answer

从http://simplehtmldom.sourceforge.net/manual.htm#frag_find_attr来看，您可以使用：

->find("div[class!=skip_me]")

或者使用 DOM 方法并检查->getAttribute("class")一个值。

score 0 · Accepted Answer

  // DOM can load HTML soup. But, HTML soup can throw warnings, suppress
  // them.
  $htmlDom = new DOMDocument();
  @$htmlDom->loadHTML($html);
  if ($htmlDom) {
    // It's much easier to work with simplexml than DOM, luckily enough
    // we can just simply import our DOM tree.
    $elements = simplexml_import_dom($htmlDom);

这是来自 Drupal 7 SimpleTest 的（几乎）引用。之后，使用文档就容易多了，类可以通过 $element['class'] 来访问

php - 使用 php simple dom parser 进行 php 屏幕抓取

2 回答 2

Related

Reference