我想从一些 HTML 中提取多个值,我觉得 XPath 可能是执行此操作的理想方式。
我正在考虑做的是遍历每个tr
具有该类的类,data
然后在循环中获取我需要的数据,即route_number
(a
也在标题中)和文本中的via
文本。
HTML 如下:
<tr class="data"><th class="route_number"><a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood"><span class="route_number small_curvy">1</span></a></th>
<td class="main_and_via">
<a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood">Dudley - Sedgley - Wolverhampton - Tettenhall Wood</a>
<span class="via"><strong>via</strong> Dudley Road and Tettenhall Road</span>
</td>
</tr><tr class="data"><th class="route_number"><a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole"><span class="route_number small_curvy">2</span></a></th>
<td class="main_and_via">
<a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole">Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole</a>
<span class="via"><strong>via</strong> Yardley Wood Road</span>
</td>
</tr>
是循环遍历每个tr
然后对route number
,anchor text
和via text
理想的单独查询还是可以使用单个 XPath 查询来完成?