3

I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:

 foreach (var textNode in node.SelectNodes(".//text()")) 
 //do stuff here 

However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:

/html[1]/body[1]/div[1]/a[1]/#text

Yet I only want the containing node of the text, for example:

/html[1]/body[1]/div[1]/a[1]

Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?

4

2 回答 2

3

Instead of:

.//text() 

use:

.//*[normalize-space(text())]

This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.

于 2013-03-20T03:59:13.403 回答
2

Why don't you

string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);
于 2013-03-20T01:44:48.027 回答