python - 这个函数在涉及 urllib2 和 BeautifulSoup 的 Python 中做了什么？

Question

所以我之前问了一个关于从 html 页面检索高分的问题，另一个用户给了我以下代码来帮助我。我是python和beautifulsoup的新手，所以我正在尝试逐个浏览其他代码。我了解其中的大部分内容，但我不明白这段代码是什么以及它的功能是什么：

    def parse_string(el):
       text = ''.join(el.findAll(text=True))
       return text.strip()

这是整个代码：

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import sys

URL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1]

# Grab page html, create BeatifulSoup object
html = urlopen(URL).read()
soup = BeautifulSoup(html)

# Grab the <table id="mini_player"> element
scores = soup.find('table', {'id':'mini_player'})

# Get a list of all the <tr>s in the table, skip the header row
rows = scores.findAll('tr')[1:]

# Helper function to return concatenation of all character data in an element
def parse_string(el):
   text = ''.join(el.findAll(text=True))
   return text.strip()

for row in rows:

   # Get all the text from the <td>s
   data = map(parse_string, row.findAll('td'))

   # Skip the first td, which is an image
   data = data[1:]

   # Do something with the data...
   print data

score 3 · Accepted Answer

el.findAll(text=True)返回元素及其子元素中包含的所有文本。文本是指不在标签内的所有内容；所以在<b>hello</b>那时“你好”将是文本，但<b>不会</b>。

因此，该函数将给定元素下方的所有文本连接在一起，并从前后去除空白。

这是findAll文档的链接：http ://www.crummy.com/software/BeautifulSoup/documentation.html#arg-text

python - 这个函数在涉及 urllib2 和 BeautifulSoup 的 Python 中做了什么？

1 回答 1

Related

Reference