我有一个格式如下的 xhtml 文件。我正在尝试按顺序提取标签之间的所有文本。我可以通过调用 mythis_list = get_e('td')
然后将该列表传递给另一个函数来获取除 BAC 之外的所有内容以获取文本为get_text(this_list)
. 我想知道是否可以对我的函数进行轻微修改以获取所有文本。任何人都可以提供一些建议吗?
<tr>
<td colspan="1" rowspan="1" class="lft">
<a shape="rect" href="http://www.usatoday.idmanagedsolutions.com/stocks/new/quote.idms?SYMBOL_US=BAC">
BAC</a>
</td>
<td colspan="1" rowspan="1" class="lft">
Bank Of America Corporation</td>
<td colspan="1" rowspan="1">
9.79
</td>
<td colspan="1" rowspan="1">
-0.07
</td>
<td colspan="1" rowspan="1">
<span class="neg-arrw">
-0.71%
</span>
</td>
<td colspan="1" rowspan="1">
71,370,166
</td>
</tr>
<tr class="evenrow">
<td colspan="1" rowspan="1" class="lft">
VALE
</td>
<td colspan="1" rowspan="1" class="lft">
Vale S A
</td>
<td colspan="1" rowspan="1">
17.52
</td>
<td colspan="1" rowspan="1">
+0.09
</td>
<td colspan="1" rowspan="1">
<span class="pos-arrw">
+0.49%
</span>
</td>
<td colspan="1" rowspan="1">
15,461,788</td>
</tr>
我正在使用以下功能
def get_e(tag):
l=[]
els=dom.getElementsByTagName(tag)
for e in els:
for child_el in els.childNode:
lst.append(child_el)
return l
def get_text(els):
l=[]
for e in els
if e.nodeType == e.TEXT_NODE:
l.append(e.data)
return lst