1

我对 HXT 有疑问。我想解析一个猫头鹰文件,但我的箭头有问题,因为他不想解析树!我看到问题在于:首先,代码:

import System.Environment  --para uso do getArgs

import Data.List.Split (splitOn)


data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)


main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser


parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"



split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l


atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = (split subClass) }

有了这个,我应该能够在 owl 文件上解析它所在的每个节点,这个例子:

<owl:Class rdf:about="Damien">
    <rdfs:subClassOf rdf:resource="PurchaseableItem"/>
</owl:Class>

但是,当我想解析这样的树时,它根本不会计算并扔掉它!

<owl:Class rdf:about="&camera;BodyWithNonAdjustableShutterSpeed">
    <owl:equivalentClass>
        <owl:Class>
            <owl:intersectionOf rdf:parseType="Collection">
                <rdf:Description rdf:about="&camera;Body"/>
                <owl:Restriction>
                    <owl:onProperty rdf:resource="&camera;shutter-speed"/>
                    <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:cardinality>
                </owl:Restriction>
            </owl:intersectionOf>
        </owl:Class>
    </owl:equivalentClass>
</owl:Class>

为什么 ?因为子类节点不存在!但是即使子类不存在,我也希望该类可用并将其放在我的数据中!那么,这怎么可能呢?


我的最新版本:

import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)

data Class = Class {
                    name ::String,
                    subClassOf :: String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")
parseSubClass = (getAttrValue "rdf:resource") `orElse` arr (const "" )

--Test  (é preciso rever esta definição) uma falha se o nome tiver o "#"
split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = split subClass }
4

1 回答 1

1

当 SubClass 节点不存在时,您需要决定您想要什么。在我看来,你有两个选择:

  • 缺少 SubClass 节点意味着这subClass是空字符串。在这种情况下,只需将解析器更改为在构建的箭头atTag "rdfs:subClassOf"失败时回退到空字符串:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass) `orElse` arr (const "")
    
  • 缺少 SubClass 节点意味着它subClassNothing. 这需要更改您的数据定义,使其subClassOf为 type Maybe String,但之后它与之前的答案非常相似:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = fmap split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass >>> arr Just)
            `orElse` arr (const Nothing)
    

我们很清楚,因为你说这在评论中不起作用,这正是我正在运行的完整程序,它对我来说很好:

{-# LANGUAGE Arrows #-}
import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)
import Text.XML.HXT.Core

data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")
             (getAttrValue "rdf:about")
             (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"

split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- getSubClass -< l
    returnA -< Class { name = split className, subClassOf = split subClass }
    where
      getSubClass =
        (atTag "rdfs:subClassOf" >>> parseSubClass)
        `orElse` arr (const "")

请注意,如果您真的不想将多个箭头步骤与>>>or结合使用<<<,另一种可能性是使用内部proc

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    subClass <- (proc l' -> do
      s <- atTag "rdfs:subClassOf" -< l'
      parseSubClass -< s)
      `orElse` constA "" -< l
    returnA -< Class { name = split className, subClassOf = split subClass}
于 2014-04-22T17:55:06.783 回答