php - What is the difference between these 2 XPath queries?

Question

Both queries retrive pictures that has been liked more than 5 times. What I think is that query 1 defines the relative path while query 2 don't and that's why query 2 uses "ancestor". Am I right? Which is better?

First Query:

//div[@class="pin"]
     [.//span[@class = 'LikesCount']
             [substring-before(normalize-space(.),' ') > 5]]
     /div[@class="PinHolder"]
     /a/img

Second Query:

//div[@class="pin"]/p/span[@class="LikesCount"]
[substring-before(., " ") > 5]/ancestor::div[@class="pin"]/a/img

Follows the Markup:

<div class="pin">

[...]

<div class="PinHolder">
<a href="/pin/56787645270909880/" class="PinImage ImgLink">
    <img src="http://media-cache-ec3.pinterest.com/upload/56787645270909880_d7AaHYHA_b.jpg" 
         alt="Krizia" 
         data-componenttype="MODAL_PIN" 
         class="PinImageImg" 
         style="height: 288px;">
</a>
</div>

<p class="stats colorless">
    <span class="LikesCount"> 
        22 likes 
    </span>
    <span class="RepinsCount">
        6 repins
    </span>
</p>

[...]

</div>

score 2 · Accepted Answer

There are a few differences that could make them return different results, and a few that make no difference. Some differences will affect the values of the expressions (or expressions like them) in some input and not in others.

The first expression looks for (a subset of) the elements matching the following path:
```
//div[@class='pin']/div[@class='PinHolder']/a/img
```
The second looks for (a subset of) elements matching this path:
```
//div[@class='pin']/a/img
```
So the two will never return overlapping results.
In checking the condition on the span of class LikesCount, the first expression applies the normalize-space function to the span's string value before extracting the first token; the second doesn't. Given string values like the one in the sample markup (newline, eight blanks, "22 likes", blank, newline, four blanks), this may make a difference. If the environment normalizes whitespace before evaluating the XPath expression, it might not make a difference.
Each expression tests for a span element of class LikesCount; the first expression looks for such a span among all the descendants of the div with class pin, while the second looks for it only among grandchildren whose parent is a p. In the sample XML shown, these amount to the same thing (the first expression won't find any spans of class LikesCount that the second expression doesn't also find.
Owing to the use of the ancestor axis in the second expression, the two will return different results in some cases of documents with nested div elements of class pin; if there are no such nested div elements in the input, this difference in formulation won't make a difference in the values they return.
The first expression uses more whitespace than the second, in places where whitespace is not significant to XPath. This may make the first expression easier to read for some readers; it also make it take more space (which some readers will dislike). The indentation of the predicates in expression 1 places predicates applied against the same nodes directly above and below each other, which may help some readers see that they apply to the same nodes. On rare occasions I have encountered XPath evaluators which were not prepared to handle insignificant whitespace in XPath expressions, and of course some programming languages require special treatment for multiline strings, so the multi-line indented form of the first expression may be hard to use in some environments, even though it's legal XPath.

You write "What I think is that query 1 defines the relative path while query 2 don't and that's why query 2 uses "ancestor". Am I right?" I am not sure what you mean, so this attempt at an answer may not be helpful (sorry). The first expression puts the test concerning the likes-count span into a predicate on the div of class pin, while the second navigates to the span, performs the test, and then navigates back up to the set of all ancestors which are div elements with class pin. In XML like that shown, these formulations have the same effect; they can produce slightly different results when div elements of class pin can nest. But the exact formulation of the test and the traversal matter.

You ask "Which is better?" Since they currently return different results, the first way to decide this is to ask "which of them expresses the question you want to ask?"

If they are modified to return the same result, the question will become "which of them do you find easier to understand?", so that when maintaining the code you or your successors can see what is going on?

php - What is the difference between these 2 XPath queries?

1 回答 1

Related

Reference