4

xml-conduit 文档仅列出了整个 XML 树由 a使用的ConduitM示例,例如:

<people>
    <person age="25">Michael</person>
    <person age="2">Eliezer</person>
</people>

我正在尝试解析一棵树,其中除了<person>上面的标签之外,还有我不感兴趣的深层嵌套子树(它们的确切架构甚至可能未知),例如:

<people>
    <person age="25">Michael</person>
    <tagImNotInterestedIn><!-- deeply nested complex subtree --></tagImNotInterestedIn>
    <person age="2">Eliezer</person>
</people>

使用docs中的people.hs示例进行解析时,出现以下异常:

people.hs: XmlException {xmlErrorMessage = "Expected end tag for: Name {nameLocalName = \"people\", nameNamespace = Nothing, namePrefix = Nothing}", xmlBadInput = Just (EventBeginElement (Name {nameLocalName = "tagImNotInterestedIn", nameNamespace = Nothing, namePrefix = Nothing}) [])}

基本上,我正在寻找一种方法来忽略任何标签(包括它的所有子标签和属性),除了我指定解析器的特定标签。当使用像 HXT 这样的基于 DOM 的解析器时,这显然很容易,但tag文档明确指出,除非所有子节点都被消耗,否则它将失败。

我能想到的唯一假设方法是使用函数Control.Exception来构建一个带有Maybe a结果的管道(返回Nothing异常),然后orE将它与解析器本身结合起来

尽管已经声明xml-conduit API 需要一些更新,但我认为必须有一种不那么骇人听闻的方式来忽略整个子树。任何想法将不胜感激!

4

1 回答 1

1

由于 1.5.0Text.XML.Stream.Parse提供了一个函数takeTree,它可能可以用于此目的。

{-# LANGUAGE OverloadedStrings #-}

import           Control.Monad                (void)
import           Control.Monad.Trans.Class    (lift)
import           Control.Monad.Trans.Resource (MonadThrow, runResourceT)
import           Data.ByteString.Lazy         (ByteString)
import           Data.ByteString.Lazy.Char8   (concat)
import           Data.Conduit                 (ConduitT, runConduit, (.|))
import           Data.Conduit.List            (mapM_)
import           Data.Text                    (Text, unpack)
import           Data.XML.Types               (Event)
import           Prelude                      hiding (concat, mapM_)
import           Text.XML.Stream.Parse        (choose, content, def,
                                               ignoreAnyTreeContent,
                                               ignoreAttrs, manyYield, many_,
                                               parseLBS, requireAttr, tag',
                                               tagNoAttr, takeTree)

data Person = Person Int Text deriving Show

parsePerson :: MonadThrow m => ConduitT Event o m (Maybe Person)
parsePerson = tag' "person" (requireAttr "age") $ \age -> do
    name <- content
    return $ Person (read $ unpack age) name

parsePeople :: MonadThrow m => ConduitT Event Person m ()
parsePeople = void $ tagNoAttr "people" $
  many_ (choose([takeTree "person" ignoreAttrs, ignoreAnyTreeContent])) .| manyYield parsePerson

persons :: ByteString
persons = concat [
    "<people>"
  , "<foo/>"
  , "<person age=\"25\">Michael</person>"
  , "<bar/>"
  , "<person age=\"2\">Eliezer</person>"
  , "<tagImNotInterestedIn>x</tagImNotInterestedIn>"
  , "</people>"

main :: IO ()
main = runResourceT $
  runConduit $ parseLBS def persons .| parsePeople .| mapM_ (lift . print)

上面的代码基于xml-conduit sample。只是parsePeople变了。

λ> main
Person 25 "Michael"
Person 2 "Eliezer"
于 2018-08-01T20:01:43.697 回答