java - Java - 罗马：我正在尝试解析 RSS 提要，但在某些频道上出现错误

Question

我正在尝试使用 RSS 并对其进行解析。我找到了罗马，我正在尝试通过代码使用它：

private SyndFeed parseFeed(String url) throws IllegalArgumentException, FeedException, IOException {
        return new SyndFeedInput().build(new XmlReader(new URL(url)));
    }


    public Boolean processRSSContent(String url) {
        try {
            SyndFeed theFeed = this.parseFeed(url);
            SyndEntry entry = theFeed.getEntries().get(0);
            ZonedDateTime entryUtcDate = ZonedDateTime.ofInstant(entry.getPublishedDate().toInstant(), ZoneOffset.UTC);
            String entryTitle = entry.getTitle();
            String entryText = entry.getDescription().getValue();
        }
        catch (ParsingFeedException e) {
            e.printStackTrace();
            return false;
        }
        catch (FeedException e) {
            e.printStackTrace();
            return false;
        }
        catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

在http://feeds.bbci.co.uk/news/world/rss.xml等某些频道上，一切正常，但在 http://habrahabr.ru/rss/等其他频道上，我收到错误消息：

Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".

我看了看这个链接后面的内容，XML 真的很奇怪。但它是一个受欢迎的网站，我在其他一些网站上遇到了这个错误，所以我不认为 XML 有问题。我做错了什么？如何阅读这个 RSS 频道？

score 4 · Accepted Answer

如果您将网址http://habrahabr.ru/rss/放到浏览器中，您会注意到它重定向到https://habrahabr.ru/rss/interesting。您的代码不处理重定向。

我建议您使用rome -fetcher模块中的 HttpClientFeedFetcher，它处理重定向并具有其他高级功能（缓存、条件 GET、压缩）：

HttpClientFeedFetcher feedFetcher = new HttpClientFeedFetcher();
try {
    SyndFeed feed = feedFetcher.retrieveFeed(new URL("http://habrahabr.ru/rss/"));
    System.out.println(feed.getLink());
} catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
    e.printStackTrace();
}

编辑：Rome-fetcher 已被弃用，但可以使用 Apache HttpClient 代替，它更灵活。

java - Java - 罗马：我正在尝试解析 RSS 提要，但在某些频道上出现错误

1 回答 1

Related

Reference