haskell - 如何跳过 xml-conduit 中的元素

Question

我必须处理相当大的 XML 文件，我想使用流 APIxml-conduit来浏览它们并提取我需要的信息。在我的情况下，使用流xml-conduit特别吸引人，因为我不需要来自这些文件的太多数据，而且我需要对其执行简单的聚合，因此管道是完美的。

现在，我并不总是知道文件的确切结构。文件是由世界各地不同版本的（有时是错误的）软件生成的，所以我不能强加模式。

但是，我知道我感兴趣的元素及其形状。但是，正如我所说，这些元素可以与其他元素以不同的顺序定位，等等。

我想，我需要的只是跳过所有我不感兴趣的元素，只考虑那些想要的元素。

我最初想写这样的东西：

tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)

但它不会编译，因为ignoreType返回Maybe ()

xml-conduit使用流 API时跳过所有“未知”标签的方法是什么？

score 1 · Accepted Answer

正如这里提出的

λ> runConduit $ Text.XML.Stream.Parse.parseLBS def  "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume 
[Person 25 "Michael",Person 2 "Eliezer"]

haskell - 如何跳过 xml-conduit 中的元素

1 回答 1

Related

Reference