1

想象一下,我有 html 里面有 meta 标签

<meta property="og:country-name" content="South Africa"/>

问题是,我需要从整页的 html 标记中获取国家名称

from bs4 import BeautifulSoup as BS
url ="mydomain.com"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
soup = BS(data)
print soup.findAll(...

不知道接下来会是什么。有什么建议么?

4

1 回答 1

2

搜索<meta>具有特定属性的标签:

country_meta = soup.find('meta', attrs={'property': 'og:country-name', 'content': True})
if country_meta:
    country = country_meta['content']

演示:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <html><head>
...     <meta property="og:country-name" content="South Africa"/>
...     <title>Foo</title>
... </head><body></body></html>''')
>>> country_meta = soup.find('meta', attrs={'property': 'og:country-name', 'content': True})
>>> country_meta
<meta content="South Africa" property="og:country-name"/>
>>> print country_meta['content']
South Africa
于 2013-09-06T14:53:01.520 回答