0

I'm scraping a page that has a link like this:

<a id="something" href="place" class="thing" data="12345">
<span class="otherthing"></span></a>

I'd like to extract the number in the field called data. I've been trying to use BeautifulSoup like this:

soup = BeautifulSoup(response)
for a in soup.findAll('a'):
        if 'data' in a['a']:
                print a['a']['data']

But I'm getting a key error.

4

2 回答 2

1

仅获取<a>具有data属性的元素:

data = [a['data'] for a in soup.findAll('a', data=True)]

data只保留那些在属性中包含整数的元素:

import re

data = [int(a['data']) for a in soup.findAll('a', data=re.compile(r"^\d+$"))]
于 2013-04-28T03:20:57.580 回答
1

也许这就是你需要的:

for a in soup.findAll('a'):
    if a.has_attr('data'):
        print(a['data'])
于 2013-04-27T16:27:27.690 回答