c# - Selecting all nodes containing text with XPath

Question

I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:

 foreach (var textNode in node.SelectNodes(".//text()")) 
 //do stuff here

However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:

/html[1]/body[1]/div[1]/a[1]/#text

Yet I only want the containing node of the text, for example:

/html[1]/body[1]/div[1]/a[1]

Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?

score 3 · Accepted Answer

Instead of:

.//text()

use:

.//*[normalize-space(text())]

This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.

score 2 · Accepted Answer

Why don't you

string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);

c# - Selecting all nodes containing text with XPath

2 回答 2

Related

Reference