我正在尝试为 XML.Light 数据类型编写一个高效的 XML 呈现器,并且我正在尝试这样做,Data.Text.Lazy.Builder
因为这似乎是一个显而易见的选择。但是,我很难从我的解决方案中获得任何性能:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text, unpack)
import Text.XML.Light
import qualified Data.Text.Lazy as LT
import qualified Data.Text.Lazy.Builder as LB
import Data.Foldable (foldMap)
import Data.Monoid (mconcat)
data Tag = Tag !Text
data Artist = Artist { artistName :: !Text , artistTags :: ![Tag] }
class ToXML a where toXML :: a -> Content
instance ToXML Artist where
toXML a = Elem $
Element (unqual "artist") []
[ text (artistName a)
, Elem $ Element (unqual "tag-list") []
(map toXML (artistTags a))
Nothing
]
Nothing
instance ToXML Tag where
toXML (Tag t) = Elem $ Element (unqual "tag") [] [ text t ] Nothing
text :: Text -> Content
text t = Text $ CData CDataText (unpack t) Nothing
render :: Content -> LB.Builder
render (Elem e) = renderElement e
render (Text s) = LB.fromString (cdData s)
renderElement :: Element -> LB.Builder
renderElement element = mconcat
[ LB.singleton '<'
, LB.fromString . qName . elName $ element
, LB.singleton '>'
, foldMap render (elContent element)
, LB.fromText "</"
, LB.fromString . qName .elName $ element
, LB.singleton '>'
]
main :: IO ()
main = let artist = Artist "Nirvana" (replicate 5000000 (Tag "Hi"))
xml = Element (unqual "metadata") [] [ toXML artist ] Nothing
in print (LT.length . LB.toLazyText . renderElement $ xml)
根据+RTS -s
:
7,368,153,472 bytes allocated in the heap
2,625,983,944 bytes copied during GC
708,149,024 bytes maximum residency (13 sample(s))
21,954,496 bytes maximum slop
1443 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 14270 colls, 0 par 1.65s 1.69s 0.0001s 0.0009s
Gen 1 13 colls, 0 par 2.57s 2.80s 0.2157s 1.2388s
TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.81s ( 1.84s elapsed)
GC time 4.22s ( 4.50s elapsed)
EXIT time 0.07s ( 0.09s elapsed)
Total time 6.11s ( 6.43s elapsed)
Alloc rate 4,064,658,288 bytes per MUT second
Productivity 30.8% of total user, 29.3% of total elapsed
这太可怕了。不仅是最低的生产力,堆中分配了超过 7GiB 来渲染 64MB 的 XML。这似乎非常低效!但是,我不知道所有这些垃圾实际上是从哪里来的。我生成了一个堆配置文件+RTS -p
并使用以下内容渲染它hp2ps
:
我还+RTS -l
使用 ThreadScope 运行它并渲染它:
遗憾的是,我现在不知道该怎么做,就是将这些部分放在一起以提高生产力并降低内存使用量。我确实想知道 in 的类型XML.Light
是否不是最优的(没有严格性,String
over Text
)但仍然 -这么慢?
我还观察到其他一些我觉得有点奇怪的东西。如果我main
改为:
main :: IO ()
main = let artist = Artist "Nirvana" (replicate 5000000 (Tag "Hi"))
xml = Element (unqual "metadata") [] [ toXML artist ] Nothing
in print (LT.length $ LB.toLazyText $ mconcat $ map (render.toXML) $ artistTags artist)
生产力飙升至 94%,所以也许这与递归有关,toXML
因为它存在问题并且过于懒惰。