如何从两个网站提取的两个文档中删除常用词?我已经从两个站点提取了新闻,现在我想从两个文档中删除常用词。我使用以下代码从两个不同的网站提取新闻:
from __future__import unicode_literals
import feedparser
import re
d=feedparser.parse('http://feeds.bbci.co.uk./news/rss.xml')
i=0
for post in d.entries
titl = post.title
desc = post.description
titl2 = tit1.replace('\\'," ")
desc1 = desc.replace('/'," ")
print(str(i) + ' ' + titl2)
i=i+1
print "indian Express"
g=feedparser.parse('http://www.rssmicro.com/rss.web?q=Android')
i=0
for pos in g.entries:
tit = post.title
#desc=post.description
tit4 = tit.replace('\\'," ")
print(str(i) + ' ' + tit4)
i=i+1