java - 使用带有 JDOM/JAXEN/SAXON 的 XPath 搜索 XML

Question

我确实有一个正在解析的 XML 文档JDOM-2.0.5。以下代码工作正常，bookNodes列表包含我的 XML 文件中的所有书籍节点：

SAXBuilder builder = new SAXBuilder();

// @see http://xerces.apache.org/xerces-j/features.html
// Disable namespace validation
builder.setFeature("http://xml.org/sax/features/namespaces", false);

Document doc = null;

try {
    doc = builder.build(xmlURL);
} catch (JDOMException | IOException e) {
    e.printStackTrace();
    return null; 
}

// get browse elmt
Element browse = doc.getRootElement().getChild("browse");

// Get all browse's chlidren
List<Element> bookNodes = browse.getChildren("book");

for (Element book : bookNodes) {
    // Do things with the selected nodes
    //...
}

这是我的 XML 数据的示例：

<?xml version="1.0" encoding="utf-8"?> 
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.example.com/XMLSchema" version="1">
    <status code="0"/>
    <link>http://www.example.com/books</link>
    <description>Browse, search and ....</description>
    <language>en-us</language>
    <pubDate>Sun, 09 Nov 2014 00:00:02 +0000</pubDate>
    <copyright>Copyright 2014, XXX</copyright>
    <category>Books</category>
    <browse>
        <book id="bk101">
            <author>Gambardella, Matthew</author>
            <title>XML Developer's Guide</title>
            <genre>Computer</genre>
            <price>44.95</price>
            <publish_date>2000-10-01</publish_date>
            <description>An in-depth look at creating applications 
            with XML.</description>
        </book>
        <book id="bk102">
            <author>Ralls, Kim</author>
            <title>The Midnight Rain</title>
            <genre>Fantasy</genre>
            <price>5.95</price>
            <publish_date>2000-12-16</publish_date>
            <description>A former architect battles corporate zombies, 
            an evil sorceress, and her own childhood to become queen 
            of the world.</description>
        </book>
        <book id="bk105">
            <author>Corets, Eva</author>
            <title>The Sundered Grail</title>
            <genre>Fantasy</genre>
            <price>5.95</price>
            <publish_date>2001-09-10</publish_date>
            <description>The two daughters of Maeve, half-sisters, 
            battle one another for control of England. Sequel to 
            Oberon's Legacy.</description>
        </book>
        <book id="bk106">
            <author>Randall, Cynthia</author>
            <title>Lover Birds</title>
            <genre>Romance</genre>
            <price>4.95</price>
            <publish_date>2000-09-02</publish_date>
            <description>When Carla meets Paul at an ornithology 
            conference, tempers fly as feathers get ruffled.</description>
        </book>
    </browse>
</Books>

问题1：

我只想选择包含一些文本的书籍节点。因此，我使用了 XPath 的查询//book[contains(./title, 'The')]和jaxen-1.1.6以下代码：

filter = "//book[contains(./title, 'The')]"; // should return 2 elements (2nd and 3rd nodes)

// use the default implementation
XPathFactory xFactory = XPathFactory.instance();

XPathExpression<Element> expr = xFactory.compile(filter, Filters.element());

List<Element> bookNodes = expr.evaluate(doc);

但是bookNodes列表是空的！

我的代码有什么问题？

问题2：

我将需要更高级的功能来搜索我的 xml 字段，例如使用：

filter = "//book[matches(./title, '^ *XML.*?Developer.*?Guide *$', 'i')]"; // should return 1 element (1st node)

然后我正在使用saxon9he它支持 XPath 2.0+，但我不知道如何使它与 JDOM2 和我上面的代码一起工作。

因此，如果您可以根据我的代码向我介绍如何执行此操作（我已经在 Google 上寻求帮助，但找不到任何帮助）

回答Q.1将帮助我了解我做错了什么。但回答Q.2将帮助我继续使用我的小型个人应用程序。

谢谢

score 1 · Accepted Answer

XPath 语言仅在名称空间格式良好的 XML 上定义，如果您尝试在没有名称空间的 XML 树上使用它，可能会产生意想不到的结果。而不是忽略命名空间，您应该正确使用它们：

SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
    doc = builder.build(xmlURL);
} catch (JDOMException | IOException e) {
    e.printStackTrace();
    return null; 
}

Namespace ns = Namespace.getNamespace("http://www.example.com/XMLSchema");

// get browse elmt
Element browse = doc.getRootElement().getChild("browse", ns);

// Get all browse's chlidren
List<Element> bookNodes = browse.getChildren("book", ns);

for (Element book : bookNodes) {
    // Do things with the selected nodes
    //...
}

对于 XPath，您需要将命名空间 URI 绑定到前缀：

filter = "//ns:book[contains(./ns:title, 'The')]";

// use the default implementation
XPathFactory xFactory = XPathFactory.instance();

XPathBuilder<Element> builder = new XPathBuilder(filter, Filters.element());
builder.setNamespace("ns", "http://www.example.com/XMLSchema");
XPathExpression<Element> expr = builder.compileWith(xFactory);

List<Element> bookNodes = expr.evaluate(doc);

关于问题 2，Saxon 的 XPath 引擎可以与 JDOM2 树一起使用，但是您必须使用 Saxon 的 XPath API 而不是 JDOM，这反过来意味着您必须使用将javax.xml.xpath命名空间前缀与 URI 关联的标准方法，这比 JDOM 更麻烦 -您必须定义自己的实现NamespaceContext或使用第三方实现，例如Spring 的 SimpleNamespaceContext。

JDOM2DocumentWrapper docw =
        new JDOM2DocumentWrapper(doc, config); // net.sf.saxon.option.jdom2

XPathEvaluator xpath = new XPathEvaluator(); // net.sf.saxon.xpath
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
nsCtx.bindNamespaceUri("ns", "http://www.example.com/XMLSchema");
xpath.setNamespaceContext(nsCtx);
List<?> bookNodes = (List<?>)xpath.evaluate(
   "//ns:book[matches(./ns:title, '^ *XML.*?Developer.*?Guide *$', 'i')]", docw,
   XPathConstants.NODESET);

（改编自 Saxon 的JDOM2Example.java）

score 0 · Accepted Answer

为了完整起见，下面是使用 Saxon 的 s9api 接口的方法：

Processor proc = new Processor();
XdmNode docw = proc.newDocumentBuilder().wrap(doc);
XPathCompiler xpath = proc.newXPathCompiler();
xpath.declareNamespace("ns", "http://www.example.com/XMLSchema");
XdmValue bookNodes = xpath.evaluate(
   "//ns:book[matches(./ns:title, '^ *XML.*?Developer.*?Guide *$', 'i')]", docw);
for (XdmItem book : bookNodes) {
 ....
}

java - 使用带有 JDOM/JAXEN/SAXON 的 XPath 搜索 XML

2 回答 2

Related

Reference