php - Using Xpath with PHP to parse HTML

Question

I'm currently trying to parse some data from a forum. Here is the code:

$xml = simplexml_load_file('https://forums.eveonline.com');

$names = $xml->xpath("html/body/div/div/form/div/div/div/div/div[*]/div/div/table//tr/td[@class='topicViews']");
foreach($names as $name) 
{
    echo $name . "<br/>";
}

Anyway, the problem is that I'm using google xpath extension to help me get the path, and I'm guessing that google is changing the html enough to make it not come up when i use my website to do this search. Is there some type of way I can make the host look at the site through google chrome so that it gets the right code? What would you suggest?

Thanks!

score 43 · Accepted Answer

我的建议是始终使用DOMDocument而不是 SimpleXML，因为它是一个更好的界面，并且使任务更加直观。

下面的示例展示了如何将 HTML 加载到 DOMDocument 对象中并使用 XPath 查询 DOM。您真正需要做的就是找到所有类名为topicViews的td元素，这将输出在此 XPath 查询返回的DOMNodeList中找到的每个nodeValue成员。

/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("https://forums.eveonline.com");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query("//td[@class='topicViews']");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $i => $node) {
    echo "Node($i): ", $node->nodeValue, "\n";
}

score 3 · Accepted Answer

双 '/' 将进行 xpath 搜索。因此，如果您使用 xpath '//table'，您将获得所有表格。您还可以在您的 xpath 结构中更深入地使用它，例如 'html/body/div/div/form//table' 来获取 xpath 'html/body/div/div/form' 下的所有表。

这样，您可以使您的代码对 html 源代码中的更改更具弹性。

如果您想使用 xpath，我确实建议您学习一些有关 xpath 的知识。复制粘贴只能让你到目前为止。

关于语法的简单解释可以在 w3schools.com/xml/xpath_syntax.asp 找到

php - Using Xpath with PHP to parse HTML

2 回答 2

Related

Reference