python - 无法使用 python 解析 RSS 提要，但 chrome 中的其他 RSS 提要应用程序可以解析数据

Question

我正在编写一个基本的 python 脚本来解析来自 SEC.gov 网站的 RSS Feed 数据，但是当我运行脚本时它失败了。我哪里错了？

我使用的 Python 版本是 3.6.5，我尝试使用库 Atoma 和 feedparser，但我无法成功提取任何 SEC RSS 数据。老实说，可能是 rss 提要数据的格式不是有效格式（我检查了https://validator.w3.org/feed/并显示数据无效）。但是当我在 Google Chrome RSS 提要扩展中尝试相同的行时，它可以工作，所以我一定是做错了什么。有谁知道如何解决格式问题，还是我在 Python 中以错误的方式处理它？

import atoma, requests

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = atoma.parse_rss_bytes(response.content)

for post in feed.items:
  date = post.pub_date.strftime('(%Y/%m/%d)')
  print("post date: " + date)
  print("post title: " + post.title)
  print("post link: " + post.link)

score 1 · Accepted Answer

这是在 Python 中解决问题的另一种方法：

import requests
import feedparser
import datetime

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = feedparser.parse(response.content)

for entry in feed['entries']:
    dt = datetime.datetime.strptime(entry['filing-date'], '%Y-%m-%d')
    print('Date: ', dt.strftime('(%Y/%m/%d)'))
    print('Title: ', entry['title'])
    print(entry['link'])
    print('\n')

url 中没有 pub_date 字段，但您可以使用申请日期或选择其他日期。您应该得到如下所示的输出：

日期：(2021/03/11) 标题：8-K - 当前报告 https://www.sec.gov/Archives/edgar/data/1616707/000161670721000075/0001616707-21-000075-index.htm

日期：(2021/02/25) 标题：S-8 - 在员工福利计划中向员工提供的证券 https://www.sec.gov/Archives/edgar/data/1616707/000161670721000066/0001616707-21-000066 -index.htm

日期：(2021/02/25) 标题：10-K - 年度报告 [第 13 和 15(d) 节，不是 SK 项目 405] https://www.sec.gov/Archives/edgar/data/1616707/000161670721000064 /0001616707-21-000064-index.htm

python - 无法使用 python 解析 RSS 提要，但 chrome 中的其他 RSS 提要应用程序可以解析数据

1 回答 1

Related

Reference