python - python feedparser不一致的项目

Question

我正在执行这些行：

import feedparser
url = 'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/feed.xml'
feed = feedparser.parse(url)
items = feed['items']
print items[0]['links'][1]['href]

哪个使用这个feedparser 模块。以下是相关 RSS 提要的示例块：

    <item>
    <title>More Android Annotations</title>
    <link>http://youtu.be/77pPceVicNI</link>
    <description><![CDATA[Walkthrough that goes a little bit more indepth to show you the things that <a href="http://androidannotations.org">AndroidAnnotations</a> can do for you as an application developer. <br /><a href="https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4">Direct download link <i>(rightclick and choose save as)</i></a>]]></description>
    <image>
        <url>https://dl.dropboxusercontent.com/u/5724095/images/Githubpics/moreAnnotations.png</url>
        <link>https://github.com/FoamyGuy/StackSites</link>
        <title>More Android Annotations</title>
    </image>
  </item>

我正在尝试获取该https://github.com/FoamyGuy/StackSites项目的一部分。在我的本地电脑（win7 python 2.6）上，这可以正常工作。但是，当我在pythonanywhere.com上的控制台而不是我的 github 链接中执行相同的行时，我得到https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4的是描述中 CDATA 末尾附近包含的 mp4 链接。

在两台机器上items[0]['links']仅包含 2 个元素（索引 0 和 1），但两台机器上索引 1 处的字符串值不同。为什么 feedparser 在一台机器上返回的值与在另一台机器上的不同？

我已经在 pythonanywhere 上打印了整个items[0]内容，并且我的 github 链接根本不包含在其中。是否有一些参数可以用来改变提要的解析方式，以便我可以正确地从中获取 github 链接？

是否有一些其他的提要解析模块可以更好地为我工作，并希望在机器之间更加一致？

score 0 · Accepted Answer

对您的提要进行了试验，看起来每个项目在“链接”中有两个条目，但看起来它们始终不同——一个将有rel="alternate"，一个将有rel="enclosure"

In [8]: items[0]['links']
Out[8]:
[{'href': u'http://youtu.be/NL7szHeEiCs',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/ButtonExample.mp4',
  'rel': u'enclosure'}]

In [9]: items[1]['links']
Out[9]:
[{'href': u'http://youtu.be/77pPceVicNI',
  'rel': u'alternate',
  'type': u'text/html'},
 {u'href': u'https://dl.dropboxusercontent.com/u/5724095/TutorialFeed/StackSitesAnnotations.mp4',
  'rel': u'enclosure'}]

那么，你能用它来得到你想要的吗？

def get_alternate_link(item):
    for link in item.links:
        if link.get('rel') == 'alternate':
            return link.get('href')

python - python feedparser不一致的项目

1 回答 1

Related

Reference