0

我想从 DIV 标签中提取内容。我正在使用scrapy来废弃一些网站,但问题是相同的DIV标签有两种类型的内容:

["<div class=\"price\">\n                <s>Rs.330</s> <b>Rs.297</b>\n                              </div>"]

["<div class=\"price\">\n                Rs.330              \n</div>"] 

如何从此标签中提取内容?

4

1 回答 1

2

使用BeautifulSoup

import bs4

html = "<div class=\"price\">\n                <s>Rs.330</s> <b>Rs.297</b>\n                              </div>"
soup = bs4.BeautifulSoup(html, features="xml")
s = soup.div.s.text # u'Rs.330'
b = soup.div.b.text # u'Rs.297'
于 2013-05-17T06:40:40.410 回答