0

鉴于我有以下 xml:

<div id="Main">
    <div class="quote">
        This is a quote and I don't want this text
    </div> 
    <p>
        This is content.
    </p>
    <p>  
        This is also content and I want both of them
    </p>
</div>

是否有“XPath”可以帮助我选择div#Main的内部文本作为单个节点,但必须排除任何div.quote的文本。

我只想要文字:“这是内容。这也是内容,我想要它们两个”

提前致谢

这是测试 XPath 的代码,我使用 .NET 和 HtmlAgilityPack 但我相信 xPath 应该适用于任何语言

[Test]
public void TestSelectNode()
{
    // Arrange 
    var html = "<div id=\"Main\"><div class=\"quote\">This is a quote and I don't want this text</div><p>This is content.</p><p>This is also content and I want both of them</p></div>";
    var xPath = "//div/*[not(self::div and @class=\"quote\")]/text()";

    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Action
    var node = doc.DocumentNode.SelectSingleNode(xPath);

    // Assert
    Assert.AreEqual("This is content.This is also content and I want both of them", node.InnerText);
}

测试显然失败了,因为 xPath 仍然不正确。

Test 'XPathExperiments/TestSelectNode' failed:
    Expected values to be equal.

    Expected Value : "This is content.This is also content and I want both of them"
    Actual Value   : "This is content."
4

3 回答 3

2

我认为没有 XPath 可以将其作为单个节点提供,因为您尝试获取的值不是单个节点。你有理由不能这样做吗?

StringBuilder sb = new StringBuilder();
// Action
var nodes = doc.DocumentNode.SelectNodes(xPath);
foreach(var node in nodes)
{
   sb.Append(node.InnerText);
}

// Assert
Assert.AreEqual("This is content.This is also content and I want both of them", 
                sb.ToString());
于 2013-01-31T10:55:29.047 回答
0

您想要 div 的任何子项的文本,这些子项不是带有类引号的 div:

div/*[not(self::div and @class="quote")]/text()
于 2013-01-30T21:46:16.773 回答
0

没有 XPath 可以为您提供组合字符串值,因为 XPath 选择节点对象并且仅选择节点对象,即使它们是文本节点也是如此。

看到有问题<p>的节点<div>,我会使用

div[@id='Main']/p/text()

<p>它会在 a 中的元素中生成文本节点列表<div id="Main">。遍历这些并连接文本内容应该很简单。

于 2019-04-27T18:42:10.183 回答