java - 流式 XPath 评估

Question

是否有任何生产就绪库可用于针对提供的 xml 文档进行流式 XPath 表达式评估？我的调查表明，大多数现有解决方案在评估 xpath 表达式之前将整个 DOM 树加载到内存中。

score 4 · Accepted Answer

XSLT 3.0 提供了流处理模式，随着 XSLT 3.0 W3C 规范成为 W3C 推荐标准，这将成为一个标准。

在撰写此答案时（2011 年 5 月），Saxon为 XSLT 3.0 流提供了一些支持。

score 3 · Accepted Answer

这对于完整的 XPath 实现是否实用，因为 XPath 语法允许：

/AAA/XXX/following::*

和

/AAA/BBB/following-sibling::*

这意味着前瞻性要求？即，无论如何您都必须从特定节点加载文档的其余部分。

Nux库（特别是StreamingPathFilter ）的文档说明了这一点，并引用了一些依赖于XPath子集的实现。Nux 声称可以执行一些流式查询功能，但鉴于上述情况，在 XPath 实现方面会有一些限制。

score 3 · Accepted Answer

有几种选择：

DataDirect Technologies 销售一种 XQuery 实现，该实现在可能的情况下采用了投影和流。它可以处理数千兆字节范围内的文件——例如大于可用内存。它是一个线程安全的库，因此很容易集成。仅限 Java。
Saxon是一个开源版本，它有一个价格适中的更昂贵的表亲，它将在某些情况下进行流式传输。Java，但也有一个 .net 端口。
MarkLogic和eXist是 XML 数据库，如果将 XML 加载到其中，它们将以相当智能的方式处理 XPath。

score 1 · Accepted Answer

1

试试约斯特。

于 2009-06-15T16:08:13.827 回答

score 1 · Accepted Answer

虽然我没有这方面的实践经验，但我认为值得一提的是 QuiXProc ( http://code.google.com/p/quixproc/ )。它是 XProc 的一种流式处理方法，并使用为 XPath 等提供流式传输支持的库。

score 0 · Accepted Answer

FWIW，我已经对非常大（> 3GB）的文件使用了 Nux 流过滤器 xpath 查询，它既可以完美地工作，也可以使用很少的内存。我的用例略有不同（不是以验证为中心），但我强烈建议您尝试使用 Nux。

score 0 · Accepted Answer

我想我会选择自定义代码。如果只想读取 xml 文档的某些路径，.NET 库让我们非常接近目标。

由于到目前为止我看到的所有解决方案都只考虑 XPath 子集，所以这也是这种解决方案。虽然子集真的很小。:)

此 C# 代码读取 xml 文件并计算给定显式路径的节点。您还可以使用xr["attrName"]语法轻松地对属性进行操作。

  int c = 0;
  var r = new System.IO.StreamReader(asArgs[1]);
  var se = new System.Xml.XmlReaderSettings();
  var xr = System.Xml.XmlReader.Create(r, se);
  var lstPath = new System.Collections.Generic.List<String>();
  var sbPath = new System.Text.StringBuilder();
  while (xr.Read()) {
    //Console.WriteLine("type " + xr.NodeType);
    if (xr.NodeType == System.Xml.XmlNodeType.Element) {
      lstPath.Add(xr.Name);
    }

    // It takes some time. If 1 unit is time needed for parsing the file,
    // then this takes about 1.0.
    sbPath.Clear();
    foreach(object n in lstPath) {
      sbPath.Append('/');
      sbPath.Append(n);
    }
    // This takes about 0.6 time units.
    string sPath = sbPath.ToString();

    if (xr.NodeType == System.Xml.XmlNodeType.EndElement
        || xr.IsEmptyElement) {
      if (xr.Name == "someElement" && lstPath[0] == "main")
        c++;
      // And test simple XPath explicitly:
      // if (sPath == "/main/someElement")
    }

    if (xr.NodeType == System.Xml.XmlNodeType.EndElement
        || xr.IsEmptyElement) {
      lstPath.RemoveAt(lstPath.Count - 1);
    }
  }
  xr.Close();

java - 流式 XPath 评估

7 回答 7

Related

Reference