python - 需要帮助编写 xpath 字符串以匹配多个（但不是全部）表格单元格

Question

注意：由于给出了一些早期答案，因此该问题已更新。这仍然是同一个问题，只是希望更清楚。

我正在尝试让站点抓取工具正常工作，但在为某些表格单元格提供合适的 xpath 字符串时遇到了问题。

<tbody>
  <tr>
    <td class="Label" width="20%" valign="top">Uninteresting section</td>
    <td class="Data"> I don't care about this</td>
  </tr>
  <tr>
    <td></td>
    <td class="Data"> I don't care about this</td>
  </tr>
  <tr>
    <td class="Label" width="20%" valign="top">Interesting section</td>
    <td class="Data"> I want this-1</td>
  </tr>
  <tr>
    <td></td>
    <td class="Data"> I want this-2</td>
  </tr>
  <tr>
    <td></td>
    <td class="Data"> I want this-n</td>
  </tr>
  <tr>
    <td class="Label" width="20%" valign="top">Uninteresting section</td>
    <td class="Data"> I don't care about this</td>
  </tr>
  <tr>
    <td></td>
    <td class="Data"> I don't care about this</td>
  </tr>
</tbody>

我想要有趣部分中所有数据字段的内容。这些可以有任意数量。我不关心代码中的其他任何内容，但我需要所有这些。

在上面的例子中：我想要 this-1 我想要 this-2 我想要 this-n

如果相关，我将 xml.dom.minidom 和 py-dom-xpath 与 Python 2.7 一起使用。

score 1 · Accepted Answer

您可以在该部分（包括其他部分）之后获得所有 n tds

 //tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class = "Data"]/text()

然后你可以得到你不想要的下一部分的所有 m tds

//tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class="Label"][1]/following::td[@class = "Data"]/text()

然后你可以在 Python 中使用第一个 n - m tds。

您可以尝试在 XPath 中使用 position 和 count 函数做同样的事情：

  //tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class = "Data"][position() <= (count(//tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class = "Data"]/text())  - count(//tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class="Label"][1]/following::td[@class = "Data"]/text()) )]/text()

如果你有 XPath 2.0，你可以优雅地使用except操作符：

 //tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class = "Data"]/text() except  //tr[@class="Entry"]//tr/td[contains(text(), "Section title")]/following::td[@class="Label"][1]/following::td[@class = "Data"]/text()

score 0 · Accepted Answer

0

//tr[@class="Entry"]/td[@class="Data"]/text()

于 2012-07-25T14:13:36.953 回答

score 0 · Accepted Answer

//tbody[tr/td[contains(text(),"Section title")]]/tr/td[@class="Data"]/text()

更新。这是做什么的：

获取tbody包含“部分标题”tr的td内容
从那些得到每个td带有 c的文本lass="Data"

python - 需要帮助编写 xpath 字符串以匹配多个（但不是全部）表格单元格

3 回答 3

Related

Reference