0

我正在尝试解析来自 NOAA 的以下 RSS 提要:http ://www.nhc.noaa.gov/rss_examples/gis-ep-20130530.xml

除了本节之外,它工作得很好:

    <item>
    <title>Summary - Remnants of BARBARA (EP2/EP022013)</title>
    <guid isPermaLink="false">summary-ep022013-201305302032</guid>
    <pubDate>Thu, 30 May 2013 20:32:00 GMT</pubDate>
    <author>nhcwebmaster@noaa.gov (NHC Webmaster)</author>
    <link>
    http://www.nhc.noaa.gov/text/refresh/MIATCPEP2+shtml/302031.shtml
    </link>
    <description>
    ...BARBARA DISSIPATES... ...THIS IS THE LAST ADVISORY... As of 2:00 PM PDT Thu May         30 the center of BARBARA was located at 18.5, -94.5 with movement NNW at 3 mph. The minimum         central pressure was 1005 mb with maximum sustained winds of about 25 mph.
    </description>
    <gml:Point>
    <gml:pos>18.5 -94.5</gml:pos>
    </gml:Point>
    **<nhc:Cyclone>
            <nhc:center>18.5, -94.5</nhc:center>
            <nhc:type>REMNANTS OF</nhc:type>
            <nhc:name>BARBARA</nhc:name>
            <nhc:wallet>EP2</nhc:wallet>
            <nhc:atcf>EP022013</nhc:atcf>
            <nhc:datetime>2:00 PM PDT Thu May 30</nhc:datetime>
            <nhc:movement>NNW at 3 mph</nhc:movement>
            <nhc:pressure>1005 mb</nhc:pressure>
            <nhc:wind>25 mph</nhc:wind>
            <nhc:headline>
            ...BARBARA DISSIPATES... ...THIS IS THE LAST ADVISORY...
            </nhc:headline>
    </nhc:Cyclone>**
    </item>

feedparser 不解析粗体部分。有没有办法确保自定义标签包含在解析中?

确认:

>>> import feedparser
>>> f = feedparser.parse('http://www.nhc.noaa.gov/rss_examples/gis-ep-20130530.xml')
>>> f.entries[1]['description']
u'Shapefile last updated Thu, 30 May 2013 15:03:01 GMT'
>>> f.entries[1]['nhc_cyclone']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "feedparser.py", line 375, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'nhc_cyclone'

输出>>> fhttps ://gist.github.com/mustafa0x/6199452

4

1 回答 1

2

在当前的提要 XML 中,您会看到自定义标签实际上在条目 3 中,而不是条目 1。此外,虽然 feedparser 可以使用自定义标签,但它们已被重命名。这在http://pythonhosted.org/feedparser/namespace-handling.html中有描述。

试试这个(我正在使用版本 5.1.2 的 feedparser):

>>> f.entries[3].title  
u'Summary - Remnants of BARBARA (EP2/EP022013)'  
>>> f.entries[3].nhc_center  
u'18.5, -94.5'  
>>> f.entries[3].nhc_type  
u'REMNANTS OF'  
>>> f.entries[3].nhc_name  
u'BARBARA'

...对于 nhc:Cyclone 的其他孩子也是如此。

于 2013-08-10T11:38:00.793 回答