我正在努力使用 haskell 删除一个 Element 及其所有子项。任务是从给定的 xml 文档中删除所有表格标签(也许我还没有理解游标的概念,或者它是我缺少的其他东西)。
我尝试了三种不同的方法:
- 具有遍历/过滤功能并使用新元素设置过滤值的镜头 -此处仅替换标签而不替换内容
- 使用光标访问表格元素 - 重置那里的内容并通过将光标向上遍历到文档根再次获取文档根 -没有过滤任何内容
- 递归过滤文档根的子节点 -没有过滤任何内容
工具
xml-conduit
xml-lens
ghc-8.0.1
输入 ( test.xml
)/输出
INPUT EXPECTED OUTPUT (for the filtered cases)
<?xml version="1.0"?> | <?xml version="1.0"?>
<root> | <root>
<a> | <a>
... | ...
</a> | </a>
<b> | <b>
<table> | <bb>
<!--table entries--> | ...
</table> | </bb>
<bb> | </b>
... | <c>
</bb> | <cc>
</b> | ...
<c> | </cc>
<cc> | </c>
... | </root>
</cc>
</c>
</root>
最小非工作示例
{-# LANGUAGE OverloadedStrings #-}
module Minimal where
import Control.Lens
import Data.Conduit.Text as CT
import Data.Default
import qualified Data.Text.Lazy.IO as TIO
import Text.XML
import Text.XML.Cursor
import qualified Text.XML.Lens as L
import Data.Maybe (isNothing, isJust)
main :: IO ()
main = do test <- Text.XML.readFile def "./test.xml"
pput $ filterDocument test
let cursor = fromDocument test
pput $ docUp $ elemUp $ getRoot ((head $ cursor $// checkName (== "table")) {child = []} )
pput $ docUp $ elemUp $ filterChildren (checkName (/= "table")) cursor
return ()
filterChildren :: Axis -> Cursor -> Cursor
filterChildren pred c = c {child = map (filterChildren pred) (c $/ pred) }
filterDocument :: Document -> Document
filterDocument doc = doc & (L.root.L.entire.filtered (\e -> isJust $ e^?L.named "table") .~ emptyElemt)
where emptyElemt = Element "empty" mempty []
-- helper functions --
docUp :: Element -> Document
docUp e = Document {documentRoot = e, documentPrologue = Prologue [] Nothing [], documentEpilogue = [] }
elemUp :: Cursor -> Element
elemUp cursor = Element {elementName = "DOC", elementAttributes = mempty , elementNodes = [node cursor]}
elemUp' :: [Cursor] -> Element
elemUp' cursors = Element {elementName = "DOC", elementAttributes = mempty , elementNodes = map node cursors}
getRoot :: Cursor -> Cursor
getRoot c = let p = (c $| parent)
in if null p then c else getRoot $ head p
pput :: Document -> IO ()
pput = TIO.putStrLn . renderText pretty
where pretty = def {rsPretty = True}
输出
> stack ghci
. . .
Ok, modules loaded: Minimal.
λ > main
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>
...
</a>
<b>
<empty>
<!-- table entries -->
</empty>
<bb>
...
</bb>
</b>
<c>
<cc>
...
</cc>
</c>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<DOC>
<root>
<a>
...
</a>
<b>
<table>
<!-- table entries -->
</table>
<bb>
...
</bb>
</b>
<c>
<cc>
...
</cc>
</c>
</root>
</DOC>
<?xml version="1.0" encoding="UTF-8"?>
<DOC>
<root>
<a>
...
</a>
<b>
<table>
<!-- table entries -->
</table>
<bb>
...
</bb>
</b>
<c>
<cc>
...
</cc>
</c>
</root>
</DOC>