0

我尝试使用从奇迹数据库 wikia 中提取的数据填充本体(您可以提取包含 wiki 的所有信息的 xml)。我的问题是这个 xml 太重了,不能用它做任何事情(超过 500Mo)。我尝试使用 xslt 将其转换为非常简单的 rdf 文件,但由于 xml 文件的大小,这几乎是不可能的。

xml 文档由以下页面组成:

<page>
<title>Aeroika (Earth-616)</title>
<ns>0</ns>
<id>1035</id>
  <sha1>11t0be5viqp0vsj8zwglfu3wea8fou4</sha1>
<revision>
  <id>1786343</id>
  <timestamp>2011-10-04T17:49:37Z</timestamp>
  <contributor>
    <username>HamsterMan</username>
    <id>2082346</id>
  </contributor>
  <minor/>
  <text xml:space="preserve" bytes="1652">{{Marvel Database:Character Template
| Image                   = Aeroika (Earth-616).jpg
| RealName                = Aeroika
| CurrentAlias            = Aeroika
| Aliases                 = 
| Identity                = 
| Affiliation             = [[Defenders (Earth-616)|Defenders]]
| Relatives               = 
| Universe                = Earth-616
| BaseOfOperations        = [[Tunnelworld]]

| Gender                  = Male
| Height                  = 
| Weight                  = 
| Eyes                    = 
| Hair                    = Gold
| UnusualSkinColour       = Gold
| UnusualFeatures         = Wings growing out of his head.
}}
[[Category:Flight]]</text>
</revision>
</page>

例如,在这种情况下,我做了一个 xslt,它在 rdf 中提取重要数据。

<xsl:template match="/">
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:si="http://www.w3schools.com/rdf/">

<xsl:for-each select="page">
    <xsl:choose>
        <xsl:when test="contains(revision/text, 'Character Template')">
            <rdf:Description rdf:about="{title}">
                <Image><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Image'),'|'),'=')" /></Image>
                <RealName><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'RealName'),'|'),'=')" /></RealName>
                <CurrentAlias><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'CurrentAlias'),'|'),'=')" /></CurrentAlias>
                <Aliases><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Aliases'),'|'),'=')" /></Aliases>
                <Identity><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Identity'),'|'),'=')" /></Identity>
                <Affiliation><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Affiliation'),'|'),'=')" /></Affiliation>
                <Relatives><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Relatives'),'|'),'=')" /></Relatives>
                <Universe><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Universe'),'|'),'=')" /></Universe>
                <BaseOfOperations><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'BaseOfOperations'),'|'),'=')" /></BaseOfOperations>
                <Gender><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Gender'),'|'),'=')" /></Gender>
                <Height><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Height'),'|'),'=')" /></Height>
                <Weight><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Weight'),'|'),'=')" /></Weight>
                <Eyes><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Eyes'),'|'),'=')" /></Eyes>
                <Hair><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Hair'),'|'),'=')" /></Hair>
                <UnusualSkinColour><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'UnusualSkinColour'),'|'),'=')" /></UnusualSkinColour>
                <UnusualFeatures><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'UnusualFeatures'),'|'),'=')" /></UnusualFeatures>
                <Citizenship><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Citizenship'),'|'),'=')" /></Citizenship>
                <MaritalStatus><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'MaritalStatus'),'|'),'=')" /></MaritalStatus>
                <Occupation><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Occupation'),'|'),'=')" /></Occupation>
                <Education><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Education'),'|'),'=')" /></Education>
                <Origin><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Origin'),'|'),'=')" /></Origin>
                <PlaceOfBirth><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'PlaceOfBirth'),'|'),'=')" /></PlaceOfBirth>
                <Creators><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Creators'),'|'),'=')" /></Creators>
                <First><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'First'),'|'),'=')" /></First>
                <HistoryText><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'HistoryText'),'|'),'=')" /></HistoryText>
                <Powers><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Powers'),'|'),'=')" /></Powers>
                <Abilities><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Abilities'),'|'),'=')" /></Abilities>
                <Strength><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Strength'),'|'),'=')" /></Strength>
                <Weaknesses><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Weaknesses'),'|'),'=')" /></Weaknesses>
                <Equipement><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Equipement'),'|'),'=')" /></Equipement>
                <Transportation><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Transportation'),'|'),'=')" /></Transportation>      
                <Weapons><xsl:value-of select="substring-after(substring-before(substring-after(revision/text, 'Weapons'),'|'),'=')" /></Weapons>
            </rdf:Description>
        </xsl:when>
        <xsl:otherwise>
        </xsl:otherwise>
    </xsl:choose>
</xsl:for-each>
</rdf:RDF>
</xsl:template>

</xsl:stylesheet> 

你知道我该怎么做吗?谢谢

4

1 回答 1

1

您的 XSLT 样式表将“普通”XML 转换为 RDF/XML 语法——它们将同样大甚至更大,并且几乎同样难以处理。此外,RDF/XML 手工编写复杂,容易出错。调试 XSLT 将是一场噩梦。

如果您的目标是使您的数据集更紧凑且更易于处理,我建议您将 XML 转换为 RDF Turtle或 RDF N-Triples语法。这些是非常简单、紧凑的基于文本的格式,非常适合流式处理,任何支持 RDF 的软件都可以读取和写入这些格式。

您可以使用 XSLT,或者如果这给您带来可伸缩性问题,请使用任何具有一些基本 XML 支持的编程/脚本语言 - 获取一个流式 XML 解析器并挂接到一个简单的脚本/程序中,该脚本/程序处理解析器输出并动态创建 RDF 数据。或者,鉴于您的输入 XML 是相当有规律的结构,您甚至可以完全跳过使用 XML 解析器,而只是将几个正则表达式组合在一起来读取数据——无论您最熟悉哪种技术。

当然,您也可以尝试使用一些内置支持此类功能的最终用户工具。例如,Topbraid Composer 为这种开箱即用的转换提供了一些花哨的功能。

于 2012-11-18T17:53:50.977 回答