xml - 是否可以让 Tokenizer 命名空间感知（使用 Splitter 时）

Question

我们有一个应用程序，我们正在处理非常大的 xml 文件 (3GB+)。对于拆分，我们使用 Tokenizer。我们收到的 xml 有不同的命名空间前缀或根本不使用前缀。Tokenizer 是否可以处理这个问题？我发现的唯一一件事是使用 inheritNamespaceTagName 属性继承默认命名空间，但不幸的是，当使用命名空间前缀时它不起作用。

谢谢你的帮助！

样品 1：

<foo:orders xmnls:foo="http://foo.com">
  <foo:order id="1">Camel in Action</order>
  <foo:order id="1">ActiveMQ in Action</order>
  <foo:order id="1">DSL in Action</order>
</foo:orders>

样本 2：

<bar:orders xmnls:foo="http://foo.com">
  <bar:order id="1">Camel in Action</order>
  <bar:order id="1">ActiveMQ in Action</order>
  <bar:order id="1">DSL in Action</order>
</bar:orders>

我们的路线：

 <route id="orderProcessorRoute">
      <from uri="file:process-xml?delete=true"/>
      <split streaming="true">
          <tokenize token="order" xml="true"/>
          <to uri="bean:xmlParseBean"/>
          <to uri="vm:orderAggregator"/>
      </split>
     <to uri="file:backup"/>
 </route>

score 2 · Accepted Answer

请参阅选项 inheritNamespaceTagName，记录在：http ://camel.apache.org/splitter.html 。所以在你的情况下应该是：

<tokenize token="order" inheritNamespaceTagName="orders" xml="true"/>

试试看。

您可以找到几篇关于如何使用 Camel 拆分大型 XML 文件的博客文章的链接。链接在此页面：http ://camel.apache.org/articles 。第二个是关于骆驼税，这是一种使用 JAXB 注释的 pojo 对数据建模的不同方法。

此外，我们正在开发一个 camel-vtdxml 组件。它还可以使用 vtd-xml 库拆分大型 XML 文件。详见：http ://camel.apache.org/vtd-xml

xml - 是否可以让 Tokenizer 命名空间感知（使用 Splitter 时）

1 回答 1

Related

Reference