3

HXT目前正在学习使用它来解析GPX文件。一个例子是here。到目前为止,我有以下内容:

import Data.Time
import Text.XML.HXT.Core

data Gpx    = Gpx [Trk]           deriving (Show)
data Trk    = Trk [TrkSeg]        deriving (Show)
data TrkSeg = TrkSeg [TrkPt]      deriving (Show)
data TrkPt  = TrkPt Double Double deriving (Show)

parseGpx =
  getChildren >>> isElem >>> hasName "gpx" >>>
  getChildren >>> isElem >>> hasName "trk" >>>
  parseGpxTrk >>> arr Gpx

parseGpxTrk = undefined
parseGpxTrkSegs = undefined

您可以看到它不完整,但仍应进行类型检查。不幸的是,我已经遇到了一个错误:

Couldn't match type ‘Trk’ with ‘[Trk]’
Expected type: Trk -> Gpx
  Actual type: [Trk] -> Gpx
In the first argument of ‘arr’, namely ‘Gpx’
In the second argument of ‘(>>>)’, namely ‘arr Gpx’

这个错误说的是我试图parseGpxTrk通过构造函数从箭头传递每个匹配的项目arr Gpx,但我真正想要的是通过arr Gpx构造函数传递整个匹配列表。

那么,我如何获得(或一般的箭头?)通过我的构造函数HXT将匹配项作为列表arr Gpx传递,而不是通过构造函数传递列表中的每个条目arr Gpx

4

1 回答 1

1

这是一个对我来说似乎相当不错的解决方案

{-# LANGUAGE Arrows #-}

import Data.Maybe
import Text.Read
import Text.XML.HXT.Core
import Control.Applicative

data Gpx    = Gpx [Trk]           deriving (Show)
data Trk    = Trk [TrkSeg]        deriving (Show)
data TrkSeg = TrkSeg [TrkPt]      deriving (Show)
data TrkPt  = TrkPt Double Double deriving (Show)

最棘手的可能是parseTrkPt因为为了正确执行此操作,您必须处理将Strings 解析为Double,这可能会失败。我已经决定让它返回 a Maybe TrkPt,然后进一步处理:

elemsNamed :: ArrowXml cat => String -> cat XmlTree XmlTree
elemsNamed name = isElem >>> hasName name

parseTrkPt :: ArrowXml cat => cat XmlTree (Maybe TrkPt)
parseTrkPt = elemsNamed "trkpt" >>>
    proc trkpt -> do
        lat <- getAttrValue "lat" -< trkpt
        lon <- getAttrValue "lon" -< trkpt
        returnA -< TrkPt <$> readMaybe lat <*> readMaybe lon

我也使用了proc这里的语法,因为我认为它更简洁。具有TrkPt <$> readMaybe lat <*> readMaybe lon类型Maybe TrkPtNothing如果其中一个readMaybes 返回,则将返回Nothing。我们现在可以汇总所有成功的结果:

parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg =
    elemsNamed "trkseg" >>>
    (getChildren >>> parseTrkPt >>. catMaybes) >. TrkSeg

括号在这里很重要,我花了一段时间才弄清楚那部分。根据您放置括号的位置,您将获得不同的结果,例如[TrkSeg [TrkPt a b], TrkSeg [TrkPt c d]]代替[TrkSeg [TrkPt a b, TrkPt c d]]. 解析器的 next 都遵循类似的模式:

parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk =
    elemsNamed "trk" >>>
    (getChildren >>> parseTrkSeg) >. Trk

parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx =
    elemsNamed "gpx" >>>
    (getChildren >>> parseTrk) >. Gpx

然后你可以很简单地运行它,尽管你仍然必须钻过根元素:

main :: IO ()
main = do
    gpxs <- runX $ readDocument [withRemoveWS yes] "ana.gpx"
                >>> getChildren
                >>> parseGpx
    -- Pretty print the document
    forM_ gpxs $ \(Gpx trks) -> do
        putStrLn "GPX:"
        forM_ trks $ \(Trk segs) -> do
            putStrLn "\tTRK:"
            forM_ segs $ \(TrkSeg pts) -> do
                putStrLn "\t\tSEG:"
                forM_ pts $ \pt -> do
                    putStr "\t\t\t"
                    print pt

诀窍是使用类型类中的方法ArrowList,特别是>.具有 type的方法a b c -> ([c] -> d) -> a b d。它聚合来自 的元素ArrowList,将其传递给将其转换为新类型的函数,然后ArrowList在该新类型上输出一个新的d

如果你愿意,你甚至可以为最后 3 个解析器抽象一点:

nestedListParser :: ArrowXml cat => String -> cat XmlTree a -> ([a] -> b) -> cat XmlTree b
nestedListParser name subparser constructor
    =   elemsNamed name
    >>> (getChildren >>> subparser)
    >.  constructor

parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg = nestedListParser "trkseg" (parseTrkPt >>. catMaybes) TrkSeg

parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk = nestedListParser "trk" parseTrkSeg Trk

parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx = nestedListParser "gpx" parseTrk Gpx

如果您想完成 GPX 文件的其余语法,这可能会派上用场。

于 2015-05-08T03:05:18.190 回答