5

我想从 .kml 文件中提取值以使用 R 进行描述。

这是文件:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
 xmlns:gx="http://www.google.com/kml/ext/2.2"
 xmlns:atom="http://www.w3.org/2005/Atom">
 <Document>
 <open>1</open>
 <visibility>1</visibility>
 <name><![CDATA[2013-07-06 4:18pm]]></name>
 ...
 <Placemark>
 <name><![CDATA[2013-07-06 4:18pm (Start)]]></name>
 <description><![CDATA[]]></description>
 <TimeStamp><when>2013-07-06T20:18:56.000Z</when></TimeStamp>
 <styleUrl>#start</styleUrl>
 <Point>
 <coordinates>-78.353348,45.020615,340.29998779296875</coordinates>
 </Point>
 </Placemark>
 <Placemark id="tour">
 <name><![CDATA[2013-07-06 4:18pm]]></name>
 <description><![CDATA[]]></description>
 ...
 <gx:Track>
 <when>2013-07-06T20:18:56.000Z</when>
 <gx:coord>-78.353348 45.020615 340.29998779296875</gx:coord>
 <when>2013-07-06T20:19:12.000Z</when>
 <gx:coord>-78.353315 45.020644 340.29998779296875</gx:coord>
 <when>2013-07-06T22:12:23.000Z</when>
 <gx:coord>-78.353108 45.020736 342.29998779296875</gx:coord>
 <ExtendedData>
  ...
  <Placemark>
  <name><![CDATA[2013-07-06 4:18pm (End)]]></name>
  <description><![CDATA[Created by Google My Tracks on Android.

  Name: 2013-07-06 4:18pm
  Activity type: cycling
  Description: -
  Total distance: 49.62 km (30.8 mi)
  Total time: 1:53:28
  Moving time: 1:50:17
  Average speed: 26.24 km/h (16.3 mi/h)
  Average moving speed: 27.00 km/h (16.8 mi/h)
 Max speed: 61.20 km/h (38.0 mi/h)
 Average pace: 2.29 min/km (3.7 min/mi)
 Average moving pace: 2.22 min/km (3.6 min/mi)
 Fastest pace: 0.98 min/km (1.6 min/mi)
 Max elevation: 406 m (1333 ft)
 Min elevation: 265 m (868 ft)
 Elevation gain: 690 m (2263 ft)
 Max grade: 12 %
 Min grade: -11 %
 Recorded: 2013-07-06 4:18pm
  ]]></description>
 ...
 </Placemark>
 </Document>
 </kml>

这是我要提取的内容,包含在

 <description><![CDATA[Created by Google My Tracks on Android.: ]]></description>

IE:

  Name: 2013-07-06 4:18pm
  Activity type: cycling
  Description: -
  Total distance: 49.62 km (30.8 mi)
  Total time: 1:53:28
  Moving time: 1:50:17
  Average speed: 26.24 km/h (16.3 mi/h)
  Average moving speed: 27.00 km/h (16.8 mi/h)
 Max speed: 61.20 km/h (38.0 mi/h)
 Average pace: 2.29 min/km (3.7 min/mi)
 Average moving pace: 2.22 min/km (3.6 min/mi)
 Fastest pace: 0.98 min/km (1.6 min/mi)
 Max elevation: 406 m (1333 ft)
 Min elevation: 265 m (868 ft)
 Elevation gain: 690 m (2263 ft)
 Max grade: 12 %
 Min grade: -11 %
 Recorded: 2013-07-06 4:18p

xmlToList 给了我,我认为 NULL 因为 CDATA 标记意味着解析器不处理以下内容:

xml <- xmlTreeParse("test1.kml", useInternalNodes=TRUE)
xmllist <- xmlToList(xml)
xmllist$Document$Placemark$description
[[1]]
NULL

我认为这就是意味着“术语 CDATA 用于不应由 XML 解析器解析的文本数据......解析器忽略 CDATA 部分中的所有内容。CDATA 部分以“””开头

以下内容对我也不起作用,可能与 CDATA 相关的原因相同:

z1 <- xpathApply(xml, "//description", xmlValue)
z1
list()

谁能帮我提取文件中的文本?

这是文件的链接:https ://docs.google.com/file/d/0B__iOdFGJbXYOHJGbWJVNW0tS3M/edit?usp=sharing

4

3 回答 3

3
doc <- xmlTreeParse("test1.kml", useInternalNodes = TRUE)
root <-xmlRoot(doc)

xmlValue(root[["Document"]][["name"]])

R> xmlValue(root[["Document"]][["name"]])
 [1] "2013-07-06 4:18pm"

并在名称列xmlToDataFrame(root)xmlToDataFrame(doc)返回该值。xmlToList在 root 或 doc 上使用会返回NULL任何 CData 的值。我正在查看名称节点,因为复制和粘贴您的示例不会xmlParse。从我自己的小测试看来,这应该适用于任何 CData。

于 2013-07-09T05:31:07.373 回答
1

Jake Burkhead 在评论中回答了这个问题。他的解决方案做到了。我非常感激。以下是从 .kml 文件中提取文本的方式:

> xml1 <- xmlTreeParse("2013-07-06 4-18pm.kml", useInternalNodes=TRUE)
> root <-xmlRoot(xml1)
> names(root[["Document"]])
  open   visibility         name       author        Style        Style        Style        Style 
  "open" "visibility"       "name"     "author"      "Style"      "Style"      "Style"      "Style" 
   Style       Schema    Placemark    Placemark    Placemark 
 "Style"     "Schema"  "Placemark"  "Placemark"  "Placemark" 
> # note that I want the text in the third "Placemark" which is in position [13] so:
> xmlValue(root[["Document"]][[13]][["description"]])
 [1] "Created by Google My Tracks on Android.\n\nName: 2013-07-06 4:18pm\nActivity type:          cycling\nDescription: -\nTotal distance: 49.62 km (30.8 mi)\nTotal time: 1:53:28\nMoving time: 1:50:17\nAverage speed: 26.24 km/h (16.3 mi/h)\nAverage moving speed: 27.00 km/h (16.8 mi/h)\nMax speed: 61.20 km/h (38.0 mi/h)\nAverage pace: 2.29 min/km (3.7 min/mi)\nAverage moving pace: 2.22 min/km (3.6 min/mi)\nFastest pace: 0.98 min/km (1.6 min/mi)\nMax elevation: 406 m (1333 ft)\nMin elevation: 265 m (868 ft)\nElevation gain: 690 m (2263 ft)\nMax grade: 12 %\nMin grade: -11 %\nRecorded: 2013-07-06 4:18pm\n"

我已经接受了答案,但我认为我将完整的解决方案放在这里以防它对其他人有所帮助。

非常感谢您的坚持,杰克。还要感谢 Ricardo 和 agstudy。

于 2013-07-09T17:05:58.857 回答
0

这个问题的一个整洁的解决方案是使用xml2包读入数据。

# Instead of xmlTreeParse
read_xml("test1.kml", options = "NOCDATA")

然后,您可以简单地使用 检索 CDATA xml_text()

# Instead of xmllist$Document$Placemark$description
read_xml("test1.kml", options = "NOCDATA") %>%
   xml_nodes("Placemark") %>% 
   xml_nodes("description") %>% 
   xml_text()
于 2020-05-12T21:48:51.900 回答