1

我有一些从 KML 文件转换为 XML 的数据,我很好奇如何使用 PHP 或 Ruby 来取回社区名称和坐标等信息。我知道他们周围有这样的标签。

<cities>
  <neighborhood>Gotham</neighborhood>
</cities>

但不幸的是,数据格式为:

<SimpleData name="neighborhd">Colgate Center</SimpleData>

代替

<neighborhd>Colgate Center</neighborhd>

这是 KML 源:

我如何使用 PHP 或 Ruby 从这样的东西中提取数据?我安装了一些 Ruby gems 来解析 XML 数据,但 XML 只是我没有用过的东西。

4

1 回答 1

2

您的 XML 无效,但 Nokogiri 将尝试修复它。

以下是如何检查无效 XML/XHTML/HTML 以及如何重写您想要的部分。

这是设置:

require 'nokogiri'

doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
  <Document>
    <Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
      <SimpleField type="int" name="nid"/>
      <SimpleField type="string" name="neighborhd"/>
      <SimpleField type="string" name="place"/>
      <SimpleField type="string" name="placecode"/>
      <SimpleField type="string" name="nbr_type"/>
      <SimpleField type="string" name="po_name"/>
      <SimpleField type="string" name="metro"/>
      <SimpleField type="string" name="country"/>
      <SimpleField type="string" name="state"/>
      <SimpleField type="string" name="statefips"/>
      <SimpleField type="string" name="county"/>
      <SimpleField type="string" name="countyfips"/>
      <SimpleField type="string" name="mcd"/>
      <SimpleField type="string" name="mcdfips"/>
      <SimpleField type="string" name="cbsa"/>
      <SimpleField type="string" name="cbsacode"/>
      <SimpleField type="string" name="cbsatype"/>
      <SimpleField type="double" name="cenlat"/>
      <SimpleField type="double" name="cenlon"/>
      <SimpleField type="int" name="color"/>
      <SimpleField type="string" name="ncs_code"/>
      <SimpleField type="string" name="release"/>
    </Schema>
    <Style id="KMLSTYLER_6">
      <LabelStyle>
        <scale>1.0</scale>
      </LabelStyle>
      <LineStyle>
        <colorMode>normal</colorMode>
      </LineStyle>
      <PolyStyle>
        <color>7f4080ff</color>
        <colorMode>random</colorMode>
      </PolyStyle>
    </Style>
    <name>Sample_Neighborhoods_NYC</name>
    <visibility>1</visibility>
    <Folder id="kml_ft_Sample_Neighborhoods_Samples">
      <name>Sample_Neighborhoods_Samples</name>
      <Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
        <name>Sample_Neighborhoods_NYC</name>
        <Placemark id="kml_1">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <styleUrl>#KMLSTYLER_6</styleUrl>
          <ExtendedData>
            <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
              <SimpleData name="nid">7086</SimpleData>
              <SimpleData name="neighborhd">Colgate Center</SimpleData>
              <SimpleData name="place">Jersey City</SimpleData>
              <SimpleData name="placecode">36000</SimpleData>
              <SimpleData name="nbr_type">S</SimpleData>
              <SimpleData name="po_name">JERSEY CITY</SimpleData>
              <SimpleData name="metro">New York City, NY</SimpleData>
              <SimpleData name="country">USA</SimpleData>
              <SimpleData name="state">NJ</SimpleData>
              <SimpleData name="statefips">34</SimpleData>
              <SimpleData name="county">Hudson</SimpleData>
              <SimpleData name="countyfips">34017</SimpleData>
              <SimpleData name="mcd">Jersey City</SimpleData>
              <SimpleData name="mcdfips">36000</SimpleData>
              <SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
              <SimpleData name="cbsacode">35620</SimpleData>
              <SimpleData name="cbsatype">Metro</SimpleData>
              <SimpleData name="cenlat">40.7145135000001</SimpleData>
              <SimpleData name="cenlon">-74.0343385</SimpleData>
              <SimpleData name="color">1</SimpleData>
              <SimpleData name="ncs_code">40910000</SimpleData>
              <SimpleData name="release">1.12.2</SimpleData>
            </SchemaData>
          </ExtendedData>
          <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0                     -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>
        </Placemark>
        <Placemark id="kml_2">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <ExtendedData>
EOT

以下是如何查看是否有错误。任何时候errors都不是空的你有问题。

puts doc.errors

这是在整个文档中查找SimpleData节点的一种方法。出于可读性的原因,我更喜欢使用 CSS 访问器而不是 XPath。有时 XPath 更好,因为它在搜索时允许更好的粒度。你需要同时学习它们。

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end

这是运行后的输出:

Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>

我不是想修改 DOM,但它很容易做到:

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end

puts doc.to_xml

运行后这是受影响的部分:

<ExtendedData>
  <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
    <nid>7086</nid>
    <neighborhd>Colgate Center</neighborhd>
    <place>Jersey City</place>
    <placecode>36000</placecode>
    <nbr_type>S</nbr_type>
    <po_name>JERSEY CITY</po_name>
    <metro>New York City, NY</metro>
    <country>USA</country>
    <state>NJ</state>
    <statefips>34</statefips>
    <county>Hudson</county>
    <countyfips>34017</countyfips>
    <mcd>Jersey City</mcd>
    <mcdfips>36000</mcdfips>
    <cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
    <cbsacode>35620</cbsacode>
    <cbsatype>Metro</cbsatype>
    <cenlat>40.7145135000001</cenlat>
    <cenlon>-74.0343385</cenlon>
    <color>1</color>
    <ncs_code>40910000</ncs_code>
    <release>1.12.2</release>
  </SchemaData>
</ExtendedData>
于 2013-05-31T17:03:33.287 回答