1

首先,这不是一个与编程相关的问题,我真的很抱歉在这里发布它,但我真的需要知道它。我正在构建一个 rss 阅读器应用程序,我只是想知道有关特色图像的信息在任何 rss xml 中的位置。以下是我从 CNN rss 获得的 xml 文件的摘录,但有关图像的信息在哪里。

<item><title>Ice melt speeding up, study finds</title><guid>http://edition.cnn.com/2012/11/29/world/europe/climate-ice-sheets/index.html</guid><link>http://edition.cnn.com/2012/11/29/world/europe/climate-ice-sheets/index.html?eref=edition</link><description>Two decades of satellite readings back up what dramatic pictures have suggested in recent years: The mile-thick ice sheets that cover Greenland and most of Antarctica are melting at a faster rate in a warming world.</description><pubDate>Thu, 27 Jun 2013 08:59:27 EDT</pubDate></item>
<item><title>Twins 'stolen' from hospital rescued</title><guid>http://edition.cnn.com/2013/08/10/world/asia/china-baby-trafficking-twin-girls/index.html</guid><link>http://edition.cnn.com/2013/08/10/world/asia/china-baby-trafficking-twin-girls/index.html?eref=edition</link><description>Police in China have rescued twin baby girls allegedly sold by a maternity doctor, bringing the number of infants recovered from the suspected trafficking ring to three, state media reported. </description><pubDate>Sun, 11 Aug 2013 19:31:43 EDT</pubDate></item>
<item><title>HK makes $5M ivory bust</title><guid>http://edition.cnn.com/2013/08/08/world/hong-kong-ivory-tusk-seizure-august/index.html</guid><link>http://edition.cnn.com/2013/08/08/world/hong-kong-ivory-tusk-seizure-august/index.html?eref=edition</link><description>In one of the biggest busts of its kind in Hong Kong, customs authorities this week seized more than 1,100 ivory tusks, 13 rhino horns and five leopard pelts. The haul, found in a container shipped from Nigeria, is valued at more than $5.3 million.</description><pubDate>Sun, 11 Aug 2013 19:31:58 EDT</pubDate></item>
<item><title>Human transmission of H7N9</title><guid>http://edition.cnn.com/2013/08/07/health/china-bird-flu-transmission/index.html</guid><link>http://edition.cnn.com/2013/08/07/health/china-bird-flu-transmission/index.html?eref=edition</link><description>Until this week, no cases of human-to-human transmission of the deadly bird flu virus that broke out in China this year had been reported.</description><pubDate>Wed, 07 Aug 2013 22:16:18 EDT</pubDate></item>
<item><title>Doctor accused of taking newborns</title><guid>http://edition.cnn.com/2013/08/07/world/asia/china-baby-trafficking-shaanxi/index.html</guid><link>http://edition.cnn.com/2013/08/07/world/asia/china-baby-trafficking-shaanxi/index.html?eref=edition</link><description>Chinese health authorities have promised an overhaul in hospitals across the country following the arrest of an obstetrician for allegedly selling newborns to human traffickers, state media reports.</description><pubDate>Wed, 07 Aug 2013 03:38:22 EDT</pubDate></item>
<item><title>Chinese tourists targeted in Paris</title><guid>http://edition.cnn.com/2013/08/07/travel/chinese-tourists-paris-pickpockets/index.html</guid><link>http://edition.cnn.com/2013/08/07/travel/chinese-tourists-paris-pickpockets/index.html?eref=edition</link><description>It's known as the City of Light, but it risks becoming known as the city of the light-fingered.</description><pubDate>Wed, 07 Aug 2013 22:16:33 EDT</pubDate></item>

我是否必须编写一个网络爬虫来跟踪提要链接并从目标页面中删除图像和文本?我只需要知道专业的 RSS 阅读器是如何工作的。

仅供参考,我已经用谷歌搜索了很多关于这个但没有成功,所以这就是我问你们的原因。请帮忙。

4

1 回答 1

1

由于有关图像的信息未存储在 xml 中,因此必须以某种方式对其进行爬网。

我是否必须编写一个网络爬虫来跟踪提要链接并从目标页面中删除图像和文本?

是的。对于您链接的 cnn 故事,标题图像始终位于 div 类"cnn_stryimg640captioned"中。

您必须分别处理视频和图片库(作为标题)。

我只需要知道专业的 RSS 阅读器是如何工作的。

专业的 RSS 阅读器有一些奇特的算法,可以帮助他们确定哪些图像与文章相关。他们并不总是做对,很难。

于 2013-08-12T15:43:14.757 回答