php - PHP 卷曲到 PHP DOMDocument

Question

这是我从网页中提取的相同代码...

<div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Profession
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    administration
                </span>
            </div>
        </div>

<div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Industry
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    banking
                </span>
            </div>
        </div>

我想要实现的是在这些 DIV 中提取数据。例如...

职业=管理员行业=银行

目前我正在用 Curl 拉网页，然后去掉 html 标签，并使用数百个 preg_match 和 if 函数。虽然该解决方案运行良好，但它确实使用了大量的 CPU 和内存。

有人建议我改用 DOMDocument ，但我似乎无法工作，主要是由于缺乏知识。

有人可以告诉我如何提取这些数据吗？

score 0 · Accepted Answer

将我之前的评论作为可能的分析器发布，并解释为什么我认为这是解决问题的方法：

$dom = new DOMDocument;
$dom->loadHTML($theHtmlString);
//get all profileheadtitle nodes
//they seem to contain the first bits of info you're after
$xpath = new DOMXpath($dom);
$titles = $xpath->query('//*[@class="profileheadtitle"]);
//let's iterate over them, using the `textContent` property to get the value
foreach ($titles as $div)
{
    //each node also has a second div right next to it
    //it's on the same level and we need its content, too
    //enter the DOMNode::$nextSibling property
    echo $div->textContent . ' ' . $div->nextSibling->textContent;
}

任务完成。请查看课程文档以获取详细信息，DOMNode您可能也想阅读课程 DOMXpath

请注意，此位：$xpath->query('//*[@class="profileheadtitle"]);查询 DOM 以查找具有该类的所有节点。profileheadtitle如果要将节点限制为仅<div>具有此类的元素，则可以这样编写：

$xpath->query('//div[@class="profileheadtitle"]);

同样重要的是要理解，虽然有效，但如果部分（或全部）div 有多个类，则此 xpath 表示法将不起作用。它只返回具有一个类的节点。更学术上正确的方法是这样写：

$xpath->query(
    '//div/[contains(concat(" ", normalize-space(@class), " "), concat(" ", "profileheadtitle", " "))]'
);

这将能够处理如下节点：

和

<div id="bar" class="foo profileheadtitle mark-red" style="border: 1px solid black;"></div>

php - PHP 卷曲到 PHP DOMDocument

1 回答 1

Related

Reference