0

我希望输出为“印地语”、“英语”。我能够获得“印地语”,但在输出“英语”时遇到了困难

输入:

<td class="_480u">
<div class="clearfix">
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and 
      <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>

我试过的代码:

>>> details.find('a',{'class':''}).string
u'Hindi'

s = details.findAll('a',{'class':''})
s1 = len(s)
list2 = []
if s1 >= 1:
   for j in range(0,s1):
      lang = s[j].find('a',{'class':''}).string.strip()
      list2.append(lang)
Traceback (most recent call last):
  File "<pyshell#220>", line 9, in <module>
    lang = s[j].find('a',{'class':''}).string.strip()
AttributeError: 'NoneType' object has no attribute 'string'


>>> s
[<a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a>, <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a>]
4

1 回答 1

1

如果那是确切的 HTML,则不会更改,您可以使用以下命令:

from bs4 import BeautifulSoup

html = '<td class="_480u">\
<div class="clearfix">\
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \
      <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>'

soup = BeautifulSoup(html)
print soup.find('a',{'class':''}).string
print soup.find('a',{'class':''}).nextSibling.nextSibling.string

输出:

Hindi
English

或者您可以这样做(如果您只使用您在问题中发布的 HTML):

from bs4 import BeautifulSoup

html = '<td class="_480u">\
<div class="clearfix">\
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \
      <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>'

soup = BeautifulSoup(html)
lang = soup.findAll('a', href = True)
for i in lang:
    print i.string

输出:

Hindi
English
于 2013-08-27T08:19:31.777 回答