这是来源:
<span class="new"> <a class="blog" href="http://whatever1.com" rel="nofollow">whatever1</a> do something at <a class="others" href="http://example1.com" rel="nofollow">example1</a></span>
<span class="new"> <a class="blog" href="http://whatever2.com" rel="nofollow">whatever2</a> do other things at <a class="others" href="http://example2.com" rel="nofollow">example2</a></span>
<span class="new"> <a class="blog" href="http://whatever3.com" rel="nofollow">whatever3</a> do something at <a class="others" href="http://example3.com" rel="nofollow">example3</a></span>
我想在其中找到所有内容<span class="new">
,do something at
这是我的代码,我只是不知道为什么它不起作用:
soup = bs4.BeautifulSoup(html, "lxml")
all_tags = soup.findAll(name = "span", attrs = {"class": "new"}, text = re.compile('do something.*'))
什么都没找到。如果我删除text = re.compile('.*do something.*')
所有上述标签,我知道我的正则表达式模式应该有问题,那么正确的形式是什么?