没关系,我错了:
def grab(soup):
return ' '.join(unicode(i.string) for i in soup.body.contents)
# soup.body.contents contains a list of all the tags
# [<span>this is a</span>, u'cat']
# [<p>Spelled f<b>o</b>etus in British English with extra "o"</p>]
# i.string gets the text of a tag, similar to .text, but if there are tags in the tag you want to get the .string of, it will return None.
# unicode() is used to convert it from a bs4 type to a string type. Used to call ' '.join()
# It's good to use unicode() instead of str():
## If you want to use a NavigableString outside of Beautiful Soup,
## you should call unicode() on it to turn it into a normal
## Python Unicode string. If you don’t, your string will carry around
## a reference to the entire Beautiful Soup parse tree, even when
## you’re done using Beautiful Soup. This is a big waste of memory.
# Lastly, as .contents returns a list, we join it together.
soup1 = BeautifulSoup('<span>this is a</span>cat')
soup2 = BeautifulSoup('Spelled f<b>o</b>etus in British English with extra "o"')
soups = [soup1, soup2] # here we have a list of the soups
for i in soups:
result = grab(i) # It will be either u'None', or the correct string with a space
if result == 'None': # If the result had a tag in between (i.e, like your second example)
print i.text
else:
print result # The result with a space.
印刷:
this is a cat
Spelled foetus in British English with extra "o"