您的 XML 无效,但 Nokogiri 将尝试修复它。
以下是如何检查无效 XML/XHTML/HTML 以及如何重写您想要的部分。
这是设置:
require 'nokogiri'
doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
<SimpleField type="int" name="nid"/>
<SimpleField type="string" name="neighborhd"/>
<SimpleField type="string" name="place"/>
<SimpleField type="string" name="placecode"/>
<SimpleField type="string" name="nbr_type"/>
<SimpleField type="string" name="po_name"/>
<SimpleField type="string" name="metro"/>
<SimpleField type="string" name="country"/>
<SimpleField type="string" name="state"/>
<SimpleField type="string" name="statefips"/>
<SimpleField type="string" name="county"/>
<SimpleField type="string" name="countyfips"/>
<SimpleField type="string" name="mcd"/>
<SimpleField type="string" name="mcdfips"/>
<SimpleField type="string" name="cbsa"/>
<SimpleField type="string" name="cbsacode"/>
<SimpleField type="string" name="cbsatype"/>
<SimpleField type="double" name="cenlat"/>
<SimpleField type="double" name="cenlon"/>
<SimpleField type="int" name="color"/>
<SimpleField type="string" name="ncs_code"/>
<SimpleField type="string" name="release"/>
</Schema>
<Style id="KMLSTYLER_6">
<LabelStyle>
<scale>1.0</scale>
</LabelStyle>
<LineStyle>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>7f4080ff</color>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<name>Sample_Neighborhoods_NYC</name>
<visibility>1</visibility>
<Folder id="kml_ft_Sample_Neighborhoods_Samples">
<name>Sample_Neighborhoods_Samples</name>
<Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
<name>Sample_Neighborhoods_NYC</name>
<Placemark id="kml_1">
<name>Colgate Center</name>
<Snippet> </Snippet>
<styleUrl>#KMLSTYLER_6</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<SimpleData name="nid">7086</SimpleData>
<SimpleData name="neighborhd">Colgate Center</SimpleData>
<SimpleData name="place">Jersey City</SimpleData>
<SimpleData name="placecode">36000</SimpleData>
<SimpleData name="nbr_type">S</SimpleData>
<SimpleData name="po_name">JERSEY CITY</SimpleData>
<SimpleData name="metro">New York City, NY</SimpleData>
<SimpleData name="country">USA</SimpleData>
<SimpleData name="state">NJ</SimpleData>
<SimpleData name="statefips">34</SimpleData>
<SimpleData name="county">Hudson</SimpleData>
<SimpleData name="countyfips">34017</SimpleData>
<SimpleData name="mcd">Jersey City</SimpleData>
<SimpleData name="mcdfips">36000</SimpleData>
<SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
<SimpleData name="cbsacode">35620</SimpleData>
<SimpleData name="cbsatype">Metro</SimpleData>
<SimpleData name="cenlat">40.7145135000001</SimpleData>
<SimpleData name="cenlon">-74.0343385</SimpleData>
<SimpleData name="color">1</SimpleData>
<SimpleData name="ncs_code">40910000</SimpleData>
<SimpleData name="release">1.12.2</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0 -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark id="kml_2">
<name>Colgate Center</name>
<Snippet> </Snippet>
<ExtendedData>
EOT
以下是如何查看是否有错误。任何时候errors
都不是空的你有问题。
puts doc.errors
这是在整个文档中查找SimpleData
节点的一种方法。出于可读性的原因,我更喜欢使用 CSS 访问器而不是 XPath。有时 XPath 更好,因为它在搜索时允许更好的粒度。你需要同时学习它们。
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end
这是运行后的输出:
Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
我不是想修改 DOM,但它很容易做到:
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end
puts doc.to_xml
运行后这是受影响的部分:
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
</SchemaData>
</ExtendedData>