haskell - put xml into a hash table

Question

I am trying to get the informations out of a xml file into a lookup table. So far I have been reading what librairies might be available and how to use them. I went with hxt and hashtables. Here is the file :

<?xml version="1.0" encoding="UTF-8" ?>

<tables>

  <table name="nametest1">
    test1
  </table>

  <table name="nametest2">
    test2
  </table>

</tables>

I would like to have the following pairs:
nametest1, test1
nametest2, test2
etc...

-- | We get the xml into a hash
getTables :: IO (H.HashTable String String)
getTables = do
  confPath <- getEnv "ENCODINGS_XML_PATH"
  doc      <- runX $ readDocument [withValidate no] confPath
  -- this is the part I don't have
  -- I get the whole hashtable create and insert process
  -- It is the get the xml info that is blocking
  where -- I think I might use the following so I shamelessly took them from the net
    atTag tag = deep (isElem >>> hasName tag)
    text      = getChildren >>> getText

I saw many examples of how to do similar things but I can't figure out how to get the name attribute at each node.

Cheers, rakwatt

score 1 · Accepted Answer

这是一个读取名为 test.xml 的文件并仅打印出 (name,text) 对的示例：

import           Text.XML.HXT.Core

-- | Gets the name attribute and the content of the selected items as a pair
getAttrAndText :: (ArrowXml a) => a XmlTree (String, String)
getAttrAndText =
      getAttrValue "name"             -- And zip it together with the the attribute name
  &&& deep getText                    -- Get the text of the node


-- | Gets all "table" items under a root tables item
getTableItem :: (ArrowXml a) => a XmlTree XmlTree
getTableItem =
      deep (hasName "tables")          -- Find a tag <tables> anywhere in the document
  >>> getChildren                      -- Get all children of that tag
  >>> hasName "table"                  -- Filter those that have the tag <table>
  >>> hasAttr "name"                   -- Filter those that have an attribute name

-- | The main function
main = (print =<<) $ runX $                       -- Print the result
      readDocument [withValidate no] "test.xml"   -- Read the document
  >>> getTableItem                                -- Get all table items
  >>> getAttrAndText                              -- Get the attribute 'name' and the text of those nodes

对的构造发生在 getAttrAndText 中。其余的功能只是打开文件并选择所有标签的直接子标签。您仍然可能希望去除文本中的前导空格。

haskell - put xml into a hash table

1 回答 1

Related

Reference