我有一个具有以下结构的巨大 kml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Style id="transBluePoly">
<LineStyle>
<width>1.5</width>
</LineStyle>
<PolyStyle>
<color>30ffa911</color>
</PolyStyle>
</Style>
<Style id="labelStyle">
<IconStyle>
<color>ffffa911</color>
<scale>0.35</scale>
</IconStyle>
<LabelStyle>
<color>ffffffff</color>
<scale>0.35</scale>
</LabelStyle>
</Style>
<Placemark>
<name>9840229084|2013-03-06 13:41:34.0|rent|0.0|2|0|0|1|T|5990F529FB98F28A1F17D182152201A4|0|null|null|null|null|null|null|null|null|null|null|F|F|0|NO_POSTCODE</name>
<styleUrl>#transBluePoly</styleUrl>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-1.5191200,53.4086600
-1.5214300,53.4011900
-1.5303600,53.4028800
-1.5435800,53.4033900
-1.5404900,53.4083600
-1.5191200,53.4086600
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark>
<name>9840031669|2013-03-06 13:14:22.0|rent|0.0|0|0|0|1|F|E5BAC836984F53F91D7F60F247920F0C|0|null|null|null|null|null|null|null|null|null|null|F|F|3641161|DE4 3JT</name>
<styleUrl>#transBluePoly</styleUrl>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-1.2370933,53.1227587
-1.2304837,53.1690463
-1.1783129,53.2226956
-1.2016444,53.2833233
-1.3213687,53.3248921
-1.4809916,53.3039582
-1.6167192,53.2438689
-1.5593782,53.1336370
-1.4296123,53.0962399
-1.3205129,53.1024090
-1.2370933,53.1227587
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
我需要从中提取 100 万个多边形以使其易于管理(知道 geo DB 是最终解决方案 - 寻找快速修复)。
将它加载到轻量级文本编辑器中并删除一些行将是我的第一个停靠点,但怀疑这将需要永远和一天的时间(它是 10 Gb,我有 16 Gb RAM)。只是想知道是否有来自 linux 终端的更智能的解决方案,可以避免将其全部读入 RAM。我已经看到 perl 和 bash 命令执行此操作,但看不到它们如何用于获取随机(或第一百万)样本:http ://www.unix.com/shell-programming-scripting/159470-filter -kml-file-xml-remove-unwanted-entries.html