提要只为您提供 XML 格式的信息,可以使用 XML 包对其进行解析。
library(XML)
url <- 'http://housesofstones.com/blog/feed/atom/'
# Download and parse the data
xml_data <- xmlParse(url)
# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)
str(head(xml_list))
List of 6
$ title :List of 2
..$ text : chr "Houses of Stones"
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ subtitle:List of 2
..$ text : chr "\"Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a c"| __truncated__
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ updated : chr "2013-05-16T12:16:49Z"
$ link : Named chr [1:3] "alternate" "text/html" "http://housesofstones.com/blog"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"
$ id : chr "http://housesofstones.com/blog/feed/atom/"
$ link : Named chr [1:3] "self" "application/atom+xml" "http://housesofstones.com/blog/feed/atom/"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"
或者,使用您的示例数据:
example_data <- '<?xml version="1.0" encoding="utf-8" standalone="yes"?><service xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"><workspace><atom:title>OperationallyAvailableCapacity</atom:title><collection href="http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3A00&LocationNbr=%25&LocationProp=%25&LocationName=%25&DirOfLow=%25&rs%3AParameterLanguage=&rs%3ACommand=Render&rs%3AFormat=ATOM&rc%3ADataFeed=xAx0x13"><atom:title>table1</atom:title></collection></workspace></service>'
xml_data <- xmlParse(example_data)
# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)
str(xml_list)
List of 1
$ workspace:List of 2
..$ title : chr "OperationallyAvailableCapacity"
..$ collection:List of 2
.. ..$ title : chr "table1"
.. ..$ .attrs: Named chr "http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3"| __truncated__
.. .. ..- attr(*, "names")= chr "href"
编辑
仔细检查后,您的特定示例数据似乎出于某种原因将大量信息保存在单个节点中,并以 URL 进行编码。如果你想要这些数据,你需要把它拿出来。
首先,调用该单个节点,并解码 URL,以便更容易解析:
xml_content <- URLdecode(xml_list$workspace$collection$.attrs)
您的各种参数由“&”分隔,因此您可以按该字符拆分字符串。
xml_content <- unlist(strsplit(xml_content, "&"))
每个新字符串都包含参数名称和值,用等号分隔。有几种方法可以将这些信息分开。也许最简单的方法是使用包中的str_split_fixed
函数plyr
:
require(stringr)
str_split_fixed(xml_content, "=", 2)
[,1] [,2]
[1,] "http://10.101.111.234/ReportServer?/InfoPost/OperationallyAvailableCapacity" ""
[2,] "AssetNbr" "51"
[3,] "beg_date" "05/03/2013 00:00:00"
[4,] "LocationNbr" "%"
[5,] "LocationProp" "%"
[6,] "LocationName" "%"
[7,] "DirOfLow" "%"
[8,] "rs:ParameterLanguage" ""
[9,] "rs:Command" "Render"
[10,] "rs:Format" "ATOM"
[11,] "rc:DataFeed" "xAx0x13"