3

我有一组要解析的 html 项目。我需要解析类名以“uid-g-uid”结尾的 div 的内容。以下是示例 div...

<div class="uid-g-uid">1121</div>

<div class="yskisghuid-g-uid">14234</div>

<div class="kif893jduid-g-uid">114235</div>

我尝试了以下组合,但没有奏效

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class=ends-with(., "uid-g-uid")]');

也试过

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class="*uid-g-uid"]');

请帮忙!

4

4 回答 4

3

ends-with() 需要 Xpath 2.0,因此它不适用于 Xpath 1.0 的 DOMXPath。像这样的东西应该可以工作:

$xpath->query('//*["uid-g-uid" = substring(@class, string-length(@class) - 8)]');
于 2013-04-09T12:29:31.830 回答
2

您想做一个 XPath 1.0 查询来检查以某个字符串结尾的字符串。字符串函数在该ends-with()版本中不可用。

我可以看到多种方法来做到这一点。在您的情况下,子字符串始终只存在一次,如果最后您可以使用contains()

//*[contains(@class, "uid-g-uid")]

如果子字符串也可能在其中的某个其他位置并且您不喜欢它,请检查它是否在末尾:

//*[contains(@class, "uid-g-uid") and substring-after(@class, "uid-g-uid") = ""]

如果它甚至可以在那里多次,那么这也行不通。在这种情况下,您可以检查字符串是否以它结尾:

//@class[substring(., string-length(.) - 8, 9) = "uid-g-uid"]/..

这甚至可能是最直接的变体,或者,作为第三个参数substring()是可选的,直到最后进行比较:

//@class[substring(., string-length(.) - 8) = "uid-g-uid"]/..
于 2013-04-09T12:49:28.680 回答
2

由于您正在寻找 XPath 1.0 中不可用的 XPath 函数,我认为您可以使用PHP 提供的DOMXPath::registerPhpFunctions功能来为您的 XPath 查询调用任何 PHP 函数。有了它,你甚至可以preg_match像这样调用函数:

$html = <<< EOF
<div class="uid-g-uid">1121</div>
<div class="yskisghuid-g-uid">14234</div>
<div class="kif893jduid-g-uid">114235</div>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// Register the php: namespace (required)
$xpath->registerNamespace("php", "http://php.net/xpath");

// Register PHP preg_match function
$xpath->registerPHPFunctions('preg_match');

// call PHP preg_match function on your xpath to make sure class ends
// with the string "uid-g-uid" using regex "/uid-g-uid$/"
$nlist = $xpath->evaluate('//div[php:functionString("preg_match",
                           "/uid-g-uid$/", @class) = 1]/text()');

$numnodes = $nlist->length; // no of divs matched
for($i=0; $i < $numnodes; $i++) { // run the loop on matched divs
   $node = $nlist->item($i);
   echo "val: " . $node->nodeValue . "\n";
}
于 2013-04-09T12:54:18.910 回答
1

试试这个:

#/ First regex and replace your class with findable flag
$bdy = preg_replace('/class=\".*?uid-g-uid\"/ims', 'class="__FINDME__"', $bdy);

#/ Now find the new flag name instead
$dom = new DOMDocument();
@$dom->loadHTML($bdy);
$xpath = new DOMXPath($dom);

$divs = $xpath->evaluate("//div[@class = '__FINDME__']");
var_dump($divs->length); die(); //check if length is >=1. else we have issue.

for($j=0; $j<$divs->length; $j++)
{
    $div = $divs->item($j);
    $div_value = $div->nodeValue;
    .  
    .  
    .  
}
于 2013-04-09T12:17:54.467 回答