xpath - XPath following-sibling 用于爬行不返回同级

Question

我正在尝试创建一个爬虫来从供应商网站中提取一些属性数据，我可以针对我们的内部属性数据库进行审计，并且是 import.io 的新手。我看了一堆视频，但虽然我的语法似乎是正确的，但我的手动 xpath 覆盖并没有返回属性值。我有以下示例 html 代码：

<table>
<tbody><tr class="oddRow">
<td class="label">&nbsp;Adhesive Type&lrm;</td><td>&nbsp;Epoxy&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Applications&lrm;</td><td>&nbsp;Hard Disk Drive Component Assembly&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Brand&lrm;</td><td>&nbsp;Scotch-Weld&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Capabilities&lrm;</td><td>&nbsp;Sustainability&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Color&lrm;</td><td>&nbsp;Clear Amber&lrm;
</td>

我正在尝试在兄弟语句之后编写一个 xpath，以通过 import.io 爬虫获取“颜色”。我选择“颜色”时的 xpath 代码是：

//*[@id="attributeList"]/table/tbody/tr[5]/td[1]

我试过使用：

//*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td

但它没有从表中获取颜色属性值。我不确定它是否与奇偶行类有关？当我查看 html 时，似乎合乎逻辑；color 是“颜色”，属性值在下面的 td 括号中。

score 7 · Accepted Answer

所选td节点中的文本不仅包含"Color". 它是 Color&lrm;。因此，您可以选择其文本包含字符串的td节点："Color"

'//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()'

xpath - XPath following-sibling 用于爬行不返回同级

1 回答 1

Related

Reference