'blue'
我怎样才能找到所有包含以下格式文本的类的跨度:
04/18/13 7:29pm
因此可能是:
04/18/13 7:29pm
或者:
Posted on 04/18/13 7:29pm
在构建执行此操作的逻辑方面,这就是我到目前为止所得到的:
new_content = original_content.find_all('span', {'class' : 'blue'}) # using beautiful soup's find_all
pattern = re.compile('<span class=\"blue\">[data in the format 04/18/13 7:29pm]</span>') # using re
for _ in new_content:
result = re.findall(pattern, _)
print result
我一直在参考https://stackoverflow.com/a/7732827和https://stackoverflow.com/a/12229134试图找出一种方法来做到这一点,但以上是我到目前为止所得到的.
编辑:
为了澄清这种情况,有跨度:
<span class="blue">here is a lot of text that i don't need</span>
和
<span class="blue">this is the span i need because it contains 04/18/13 7:29pm</span>
请注意,我只需要04/18/13 7:29pm
其余的内容。
编辑2:
我也试过:
pattern = re.compile('<span class="blue">.*?(\d\d/\d\d/\d\d \d\d?:\d\d\w\w)</span>')
for _ in new_content:
result = re.findall(pattern, _)
print result
并得到错误:
'TypeError: expected string or buffer'