xpath - 使用相对 xpath 来抓取自定义 div 属性

Question

我有几百个 URL，我试图在其中抓取页面上图像的图像路径。每个页面的格式相同，但每个页面的 div 类是唯一的。

我希望能够在 Google 工作表中使用 import xml 来抓取数据路径元素的内容。

我尝试过使用 xpath 提取 URL，但未能成功。

<div class="uniqueid active" data-path="/~/media/Images/image.jpg" data-alt="Anything"></div>

例如//div[@class='*']/@data-path"

score 0 · Accepted Answer

如果 div 类具有 pattern "uniqueid active"，那么您可以尝试以下 XPath：

//div[contains(@class, "active")]/@data-path

否则，如果 div 类可以是任何东西，请使用以下查询：

//div[@class]/@data-path

更新：

我尝试使用IMPORTXMLdata-path获取属性值，但没有成功。尝试使用 Python ( and ) 来完成它并且它有效。所以问题可能出在 Google 表格中 - 一些限制或错误，idk。requestslxml

1 回答 1