python - 使用 Universal Feed Parser 读取 RSS 提要中的扩展元素集合

Question

有没有办法使用Universal Feed Parser读取扩展元素的集合？

这只是 Kuler RSS 提要的一个简短片段：

<channel>
  <item>
    <!-- snip: regular RSS elements -->
    <kuler:themeItem>
      <kuler:themeID>123456</kuler:themeID>
      <!-- snip -->
      <kuler:themeSwatches>
        <kuler:swatch>
          <kuler:swatchHexColor>FFFFFF</kuler:swatchHexColor>
          <!-- snip -->
        </kuler:swatch>
        <kuler:swatch>
          <kuler:swatchHexColor>000000</kuler:swatchHexColor>
          <!-- snip -->
        </kuler:swatch>
      </kuler:themeSwatches>
    </kuler:themeItem>
  </item>
</channel>

我尝试了以下方法：

>>> feed = feedparser.parse(url)
>>> feed.channel.title
u'kuler highest rated themes'
>>> feed.entries[0].title
u'Foobar'
>>> feed.entries[0].kuler_themeid
u'123456'
>>> feed.entries[0].kuler_swatch
u''

feed.entries[0].kuler_swatchhexcolor只返回最后一个kuler:swatchHexColor。有没有办法用检索所有元素feedparser？

我已经通过使用 minidom 解决了这个问题，但如果可能的话，我想使用 Universal Feed Parser（由于 API 非常简单）。可以延长吗？我在文档中没有找到任何关于此的内容，所以如果有人对图书馆有更多的经验，请告诉我。

score 3 · Accepted Answer

Universal Feed Parser 对于大多数的 feed 来说真的很不错，但是对于扩展的 feed，你可能想尝试一些叫做BeautifulSoup的东西。它是一个最初为屏幕抓取而设计的 XML/HTML/XHTML 解析库；事实证明，这种事情也很出色。文档非常好，并且它有一个不言自明的 API，所以如果您正在考虑使用其他任何东西，这就是我的建议。

我可能会这样使用它：

>>> import BeautifulSoup
>>> import urllib2

# Fetch HTML data from url
>>> connection = urllib2.urlopen('http://kuler.adobe.com/path/to/rss.xml')
>>> html_data = connection.read()
>>> connection.close()

# Create and search the soup
>>> soup = BeautifulSoup.BeautifulSoup(html_data)
>>> themes = soup.findAll('kuler:themeitem') # Note: all lower-case element names

# Get the ID of the first theme
>>> themes[0].find('kuler:themeid').contents[0]
u'123456'

# Get an ordered list of the hex colors for the first theme
>>> themeswatches = themes[0].find('kuler:themeswatches')
>>> colors = [color.contents[0] for color in
... themeswatches.findAll('kuler:swatchhexcolor')]
>>> colors
[u'FFFFFF', u'000000']

所以你可能会觉得这是一个非常酷的库。如果您正在解析任何旧的 RSS 提要，这不会太好，但由于数据来自 Adobe Kuler，您可以确定它不会变化到足以破坏您的应用程序（即它是一个足够受信任的来源）。

更糟糕的是试图解析 Adobe 的该死的 .ASE 格式。我尝试为它编写一个解析器，它变得非常可怕，非常快。呃。所以，是的，RSS 提要可能是与 Kuler 交互的最简单方式。

python - 使用 Universal Feed Parser 读取 RSS 提要中的扩展元素集合

1 回答 1

Related

Reference