3

这是我正在使用的 XML 文件片段:

<page>
  <title>AccessibleComputing</title>
  <ns>0</ns>
  <id>10</id>
  <redirect title="Computer accessibility" />
  <revision>
    <id>381202555</id>
    <parentid>381200179</parentid>
    <timestamp>2010-08-26T22:38:36Z</timestamp>
    <contributor>
      <username>OlEnglish</username>
      <id>7181920</id>
    </contributor>
    <minor />
    <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment>
    <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>
    <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1>
    <model>wikitext</model>
    <format>text/x-wiki</format>
  </revision>
</page>
<page>
  <title>AfghanistanGeography</title>
  <ns>0</ns>
  <id>14</id>
  <redirect title="Geography of Afghanistan" />
  <revision>
    <id>407008307</id>
    <parentid>74466619</parentid>
    <timestamp>2011-01-10T03:56:19Z</timestamp>
    <contributor>
      <username>Graham87</username>
      <id>194203</id>
    </contributor>
    <minor />
    <comment>1 revision from [[:nost:AfghanistanGeography]]: import old edit, see [[User:Graham87/Import]]</comment>
    <text xml:space="preserve">#REDIRECT [[Geography of Afghanistan]] {{R from CamelCase}}</text>
    <sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1>
    <model>wikitext</model>
    <format>text/x-wiki</format>
  </revision>
</page>
<page>
  <title>AfghanistanPeople</title>
  <ns>0</ns>
  <id>15</id>
  <redirect title="Demography of Afghanistan" />
  <revision>
    <id>135089040</id>
    <parentid>74466558</parentid>
    <timestamp>2007-06-01T13:59:37Z</timestamp>
    <contributor>
      <username>RussBot</username>
      <id>279219</id>
    </contributor>
    <minor />
    <comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of Afghanistan&quot; +&quot;Demography of Afghanistan&quot;</comment>
    <text xml:space="preserve">#REDIRECT [[Demography of Afghanistan]] {{R from CamelCase}}</text>
    <sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1>
    <model>wikitext</model>
    <format>text/x-wiki</format>
  </revision>
</page>

现在,给定值“AccessibleComputing”,我如何检索 XMLInternalElementNode(对应于“AccessibleComputing”?我尝试使用 getNodeSet 但没有成功。

谢谢。

更新的问题

我应该首先提到整个 sample.xml 文件。就是这样。我面临的问题如下:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <base>http://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 1.21wmf8</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">Wikipedia</namespace>
      <namespace key="5" case="first-letter">Wikipedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
      <namespace key="100" case="first-letter">Portal</namespace>
      <namespace key="101" case="first-letter">Portal talk</namespace>
      <namespace key="108" case="first-letter">Book</namespace>
      <namespace key="109" case="first-letter">Book talk</namespace>
      <namespace key="446" case="first-letter">Education Program</namespace>
      <namespace key="447" case="first-letter">Education Program talk</namespace>
      <namespace key="710" case="first-letter">TimedText</namespace>
      <namespace key="711" case="first-letter">TimedText talk</namespace>
    </namespaces>
  </siteinfo>
  <page>
    <title>AccessibleComputing</title>
    <ns>0</ns>
    <id>10</id>
    <redirect title="Computer accessibility" />
    <revision>
      <id>381202555</id>
      <parentid>381200179</parentid>
      <timestamp>2010-08-26T22:38:36Z</timestamp>
      <contributor>
        <username>OlEnglish</username>
        <id>7181920</id>
      </contributor>
      <minor />
      <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment>
      <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>
      <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>
  <page>
    <title>History</title>
    <ns>0</ns>
    <id>13</id>
    <redirect title="History of " />
    <revision>
      <id>74466652</id>
      <parentid>15898948</parentid>
      <timestamp>2006-09-08T04:15:52Z</timestamp>
      <contributor>
        <username>Rory096</username>
        <id>750223</id>
      </contributor>
      <comment>cat rd</comment>
      <text xml:space="preserve">#REDIRECT [[History of ]] {{R from CamelCase}}</text>
      <sha1>d4tdz2eojqzamnuockahzcbrgd1t9oi</sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>
  <page>
    <title>Geography</title>
    <ns>0</ns>
    <id>14</id>
    <redirect title="Geography of " />
    <revision>
      <id>407008307</id>
      <parentid>74466619</parentid>
      <timestamp>2011-01-10T03:56:19Z</timestamp>
      <contributor>
        <username>Graham87</username>
        <id>194203</id>
      </contributor>
      <minor />
      <comment>1 revision from [[:nost:Geography]]: import old edit, see [[User:Graham87/Import]]</comment>
      <text xml:space="preserve">#REDIRECT [[Geography of ]] {{R from CamelCase}}</text>
      <sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>
  <page>
    <title>People</title>
    <ns>0</ns>
    <id>15</id>
    <redirect title="Demography of " />
    <revision>
      <id>135089040</id>
      <parentid>74466558</parentid>
      <timestamp>2007-06-01T13:59:37Z</timestamp>
      <contributor>
        <username>RussBot</username>
        <id>279219</id>
      </contributor>
      <minor />
      <comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of &quot; +&quot;Demography of &quot;</comment>
      <text xml:space="preserve">#REDIRECT [[Demography of ]] {{R from CamelCase}}</text>
      <sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1>
      <model>wikitext</model>
      <format>text/x-wiki</format>
    </revision>
  </page>
</mediawiki>

我如何获得标题元素值为“AccessibleComputing”的页面节点。我尝试了以下方法:

doc = xmlTreeParse('sample.xml',useInternalNodes=TRUE)
getNodeSet(doc, "//page[title=\"AccessibleComputing\"]")

它回来了

list()
attr(,"class")
[1] "XMLNodeSet"

预期输出:

[[1]]
<page>
  <title>AccessibleComputing</title>
  <ns>0</ns>
  <id>10</id>
  <redirect title="Computer accessibility"/>
  <revision>
    <id>381202555</id>
    <parentid>381200179</parentid>
    <timestamp>2010-08-26T22:38:36Z</timestamp>
    <contributor>
      <username>OlEnglish</username>
      <id>7181920</id>
    </contributor>
    <minor/>
    <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment>
    <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}    </text>
    <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1>
    <model>wikitext</model>
    <format>text/x-wiki</format>
  </revision>
</page> 

attr(,"class")
[1] "XMLNodeSet"

我想我的 XPath 查询不正确 - 一次出现的“siteinfo”节点破坏了我的尝试。有什么建议么。

4

2 回答 2

2

为了解析你的文件,我添加了一个新标签

<pages>
....
</pages>

然后使用xpathSApply,我可以检索所有,所有标题元素:

library(XML)
doc = xmlTreeParse('c:/temp/testxml.xml',useInternalNodes=TRUE)
xpathSApply(doc,'//page/title',xmlValue)
"AccessibleComputing"  "AfghanistanGeography" "AfghanistanPeople" 

您还可以getNodeSet

getNodeSet(doc,'//page/title')
[[1]]
<title>AccessibleComputing</title> 

[[2]]
<title>AfghanistanGeography</title> 

[[3]]
<title>AfghanistanPeople</title> 
于 2013-03-07T08:22:10.660 回答
0

如果您正在寻找任何具有标题值的页面,AccessibleComputing那么您应该使用getNodeSet(doc,'//page[title="AccessibleComputing"]')

如果您想获取任何具有名为 title 的直接子节点的节点, AccessibleComputing那么您应该使用getNodeSet(doc,'//node()[title="AccessibleComputing"]')

library(XML)

xml <- "<pages><page>\n<title>AccessibleComputing</title>\n<ns>0</ns>\n<id>10</id>\n<redirect title=\"Computer accessibility\" />\n<revision>\n<id>381202555</id>\n<parentid>381200179</parentid>\n<timestamp>2010-08-26T22:38:36Z</timestamp>\n<contributor>\n<username>OlEnglish</username>\n<id>7181920</id>\n</contributor>\n<minor />\n<comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>\"\n<sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page>\n<page>\n<title>AfghanistanGeography</title>\n<ns>0</ns>\n<id>14</id>\n<redirect title=\"Geography of Afghanistan\" />\n<revision>\n<id>407008307</id>\n<parentid>74466619</parentid>\n<timestamp>2011-01-10T03:56:19Z</timestamp>\n<contributor>\n<username>Graham87</username>\n<id>194203</id>\n</contributor>\n<minor />\n<comment>1 revision from [[:nost:AfghanistanGeography]]: import old edit, see [[User:Graham87/Import]]</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Geography of Afghanistan]] {{R from CamelCase}}</text>\"\n<sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page>\n<page>\n<title>AfghanistanPeople</title>\n<ns>0</ns>\n<id>15</id>\n<redirect title=\"Demography of Afghanistan\" />\n<revision>\n<id>135089040</id>\n<parentid>74466558</parentid>\n<timestamp>2007-06-01T13:59:37Z</timestamp>\n<contributor>\n<username>RussBot</username>\n<id>279219</id>\n</contributor>\n<minor />\n<comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of Afghanistan&quot; +&quot;Demography of Afghanistan&quot;</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Demography of Afghanistan]] {{R from CamelCase}}</text>\"\n<sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page></pages>"


doc = xmlTreeParse(xml, useInternalNodes = TRUE)


# If you want to get page which has immediate child node called title whose
# value is 'AccessibleComputing'
getNodeSet(doc, "//page[title=\"AccessibleComputing\"]")
## [[1]]
## <page>
##   <title>AccessibleComputing</title>
##   <ns>0</ns>
##   <id>10</id>
##   <redirect title="Computer accessibility"/>
##   <revision><id>381202555</id><parentid>381200179</parentid><timestamp>2010-08-26T22:38:36Z</timestamp><contributor><username>OlEnglish</username><id>7181920</id></contributor><minor/><comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment><text xml:space="preserve"> %InLiNe_IdEnTiFiEr% "#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>"
## <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1><model>wikitext</model><format>text/x-wiki</format></revision>
## </page> 
## 
## attr(,"class")
## [1] "XMLNodeSet"



# If you want to get any node which has immediate child node called title whose
# value is 'AccessibleComputing'
getNodeSet(doc, "//node()[title=\"AccessibleComputing\"]")
## [[1]]
## <page>
##   <title>AccessibleComputing</title>
##   <ns>0</ns>
##   <id>10</id>
##   <redirect title="Computer accessibility"/>
##   <revision><id>381202555</id><parentid>381200179</parentid><timestamp>2010-08-26T22:38:36Z</timestamp><contributor><username>OlEnglish</username><id>7181920</id></contributor><minor/><comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment><text xml:space="preserve"> %InLiNe_IdEnTiFiEr% "#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>"
## <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1><model>wikitext</model><format>text/x-wiki</format></revision>
## </page> 
## 
## attr(,"class")
## [1] "XMLNodeSet"
于 2013-03-07T10:37:16.763 回答