我正在尝试解析来自此提要的 rss 数据:http: //fulltextrssfeed.com/feeds.bbci.co.uk/news/rss.xml,它是使用FullTextRssFeed站点生成的。唯一的问题是,当我尝试获取描述时,我收到'<',其他一切正常!。我试过用JSoup 来做这个,但我不知道怎么做。你能建议怎么做吗?我使用的代码与本教程中使用的代码相同,但我替换了使用的 RSS URL。再次感谢!
4 回答
您的问题是因为您的 RSS 提要中的描述包含 html,而不是纯文本。以下是说明内容:
<div><span class="story-date"><span class="date">3 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">23:25 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/66739000/jpg/_66739180_philpotts.jpg" width="464" height="261" alt="Mick and Mairead Philpott, Paul Mosley"/><span class="c2">Mick and Mairead Philpott, and Paul Mosley, will be sentenced on Thursday</span></p> <p class="introduction" id="story_continues_1">A couple convicted of killing six of their children in a house fire in Derby are due to be sentenced later.</p> <p>Mick and Mairead Philpott will reappear at Nottingham Crown Court where they were found guilty of six counts of manslaughter, along with their friend Paul Mosley, on Tuesday.</p> <p>The maximum sentence for the crime is life imprisonment.</p> <p>Mrs Justice Thirlwall was due to pass sentence on Wednesday but needed more time to consider mitigation.</p> <p>The court was told that Philpott, 56, was jailed for seven years in 1978 for attempting to murder a previous girlfriend and given a concurrent five-year sentence for stabbing the woman's mother.</p> <p>In 1991 he received a conditional discharge for assault after he head-butted a colleague</p> <p>And in 2010 he was given a police caution after slapping Mairead and dragging her outside by her hair.</p> <p>When Philpott set fire to his house in Victory Road, Derby, he was also facing trial over a road rage incident in which he punched a motorist in the face.</p> <p>He had admitted common assault in relation to the incident but denied dangerous driving.</p> <span class="cross-head">Rape allegation</span> <p>Police have also confirmed that they intend to "thoroughly" investigate an allegation that Philpott raped a woman several years ago.</p> <p>She made the allegation after the death of Philpott's children, but police decided to wait until the end of the manslaughter trial before investigating the complaint further.</p> <p>On Tuesday the jury returned unanimous manslaughter verdicts on Philpott and Mosley, 46, while Mairead Philpott, 32, was convicted by a majority.</p> <p>Jade Philpott, 10, John, nine, Jack, eight, Jesse, six, and Jayden, five, died on the morning of the fire on 11 May 2012.</p> <p>Mairead Philpott's son from a previous relationship, 13-year-old Duwayne, died later in hospital.</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />
您需要以某种方式更改解析器,使其可以忽略描述中的 html 内容中的内容。获得完整的 html 片段后,您可以在 WebView 中呈现它。我认为通常在诸如 RSS 提要之类的 XML 数据片段中存在某种其他类型的 XML 内容(在本例中为 HTML)时使用 CDATA。老实说,虽然我不熟悉它的来龙去脉,但我可能是不正确的。
您从中获得的 HTMLmyRssFeed.getDescription()
如下所示:
<div><span class="story-date"><span class="date">6 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">08:57 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/51606000/jpg/_51606573_fa1d16c0-9c6c-4f82-b0b8-ab66ddd94f78.jpg" width="304" height="171" alt="Breaking news"/></p> <p class="introduction">Nelson Mandela has been discharged from hospital after treatment for pneumonia, South Africa's government has said.</p> <p>It said there had been "a sustained and gradual improvement in his condition".</p> <p>The 94-year-old was admitted on 27 March for a recurring lung infection and had fluid drained at the undisclosed hospital.</p> <p>Mr Mandela served as South Africa's first black president from 1994 to 1999 and is regarded by many as the father of the nation.</p> <p>The <a href="http://redirect.viglink.com?key=11fe087258b6fc0532a5ccfc924805c0&u=http%3A%2F%2Fwww.thepresidency.gov.za%2Fpebble.asp%3Frelid%3D15178">presidency statement read</a>: "Former President Nelson Mandela has been discharged from hospital today, 6 April, following a sustained and gradual improvement in his general condition.</p> <p>"The former president will now receive home-based high care. President [Jacob] Zuma thanks the hard working medical team and hospital staff for looking after Madiba so efficiently."</p> <p>Madiba is Mr Mandela's clan name.</p> <p>The statement continued: "[Mr Zuma] also extended his gratitude to all South Africans and friends of the Republic in Africa and around the world for support."</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />
使用 Jsoup 你可以试试这个(未经测试):
代替
feedDescribtion.setText(myRssFeed.getDescription());
用这个:
feedDescribtion.setText(extractDescriptionText(myRssFeed.getDescription());
使用以下方法:
private String extractDescriptionText(String description) {
StringBuffer b = new StringBuffer();
Document dom = Jsoup.parse(description);
Elements paragraphs = dom.getElementsByTag("p");
for (int i=1; i<paragraphs.size(); i++) { // start with 1 to skip the 'breaking news' paragraph
Element p = paragraphs.get(i);
b.append(p.text());
b.append("\n"); // line-break after each paragraph
}
return b.toString();
}
这应该有效。也许一些微调是必要的,但在 Jsoup 的帮助下可以很容易地实现。
编辑:
这就是extractDescriptionText()
上面示例的内容:
南非政府表示,纳尔逊·曼德拉在接受肺炎治疗后已出院。它说“他的病情已经持续和逐渐好转”。这位 94 岁的老人于 3 月 27 日因反复肺部感染入院,并在未公开的医院引流了液体。曼德拉先生于 1994 年至 1999 年担任南非第一位黑人总统,被许多人视为国父。总统声明中写道:“前总统纳尔逊·曼德拉于今天 4 月 6 日出院,此前他的总体状况持续逐渐好转。”这位前总统现在将接受居家高级护理。
在网上搜索有关如何执行此操作的想法时,我发现这样做实际上是非法的,因为这种获取内容的方法违反了我希望使用的许多网络资源的使用条款。现在,您将不得不坚持使用简短的 RSS 提要。
我会发表评论,但我没有足够的积分。
我建议使用 yahoo 管道来重定向您的 rss 提要。您甚至可以选择将其重定向为 json 而不是 xml。
如果您的解析器在您访问的大多数网站上都可以正常工作,这将是解决问题的最简单方法。