我目前正在尝试解析 HTML 文档以检索其中的所有脚注;该文件包含几十个。我真的不知道用来提取我想要的所有内容的表达式。问题是,类(例如“calibre34”)在每个文档中都是随机的。查看脚注所在位置的唯一方法是搜索“隐藏”,然后它总是文本,并用 < /td> 标记关闭。下面是 HTML 文档中脚注之一的示例,我想要的只是文本。有任何想法吗?多谢你们!
<td class="calibre33">1.<span><a class="x-xref" href="javascript:void(0);">
[hide]</a></span></td>
<td class="calibre34">
Among the other factors on which the premium would be based are the
average size of the losses experienced, a margin for contingencies,
a loading to cover the insurer's expenses, a margin for profit or
addition to the insurer's surplus, and perhaps the investment
earnings the insurer could realize from the time the premiums are
collected until the losses must be paid.</td>