到目前为止,我已经这样做了:
import urllib2,re,time
from bs4 import BeautifulSoup
base_url="http://nairobinow.wordpress.com/"
rawEventsData=urllib2.urlopen(base_url).read()
rawEventssoup = BeautifulSoup(rawEventsData)
events=rawEventssoup.findAll("div", {"id": re.compile(r'post-[\d+]')})
现在我想在标签、地点和日期之后获取数据。这是事件块(只是迭代部分之一):
<div class="post-17149 post type" id="post-17149">
<h2><a href="http://nairobinow.wordpress.com/2012/11/05/out/">Out of Town: Lamuest</a>
</h2><p>u
Dates: November 15-18, 2012<br/>
Venue: Lamu</p>
<p>Accommodation information: <a href="http://.../index.html"target="_blank"
>http://www.lamu.org/index.html</a></p></div>
任何帮助将不胜感激