如何使用 beautifulsoup 解析日期开始和日期结束值?
<h2 name="PRM-013113-21017-0FSNS" class="pointer">
<a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br>
<span>February 8, 2013 - February 10, 2013</span>
</a>
</h2>
如何使用 beautifulsoup 解析日期开始和日期结束值?
<h2 name="PRM-013113-21017-0FSNS" class="pointer">
<a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br>
<span>February 8, 2013 - February 10, 2013</span>
</a>
</h2>
像这样的东西。
import re
from BeautifulSoup import BeautifulSoup
html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date_span = BeautifulSoup(html).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = re.findall(r'<span>(.+?)</span>', str(date_span))[0]
(PS:您也可以使用 BeautifulSoup 的text=True
方法 withfindAll
来获取文本,而不是使用正则表达式,如下所示。)
from BeautifulSoup import BeautifulSoup
html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]
要将开始日期和结束日期作为单独的变量,您可以简单地拆分它们,您可以简单地拆分日期变量,如下所示:
from BeautifulSoup import BeautifulSoup
html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]
# Get start and end date separately
date_start, date_end = date.split(' - ')
nowdate_start
变量包含开始日期,date_end
变量包含结束日期。