css - Ruby Mechanize 获取具有指定文本的元素

Question

我正在尝试使用 mechanize 解析网站的内容，但我陷入了困境。我要解析的内容位于li标签内，并且顺序并不总是相同。

让我们假设我们有以下li标签的顺序并不总是相同的，有时甚至可能根本不存在。

<div class="details">
  <ul>
    <li><span>title 1</span> ": here are the details"</li>
    <li><span>title 2</span> ": here are the details"</li>
    <li><span>title 3</span> ": here are the details"</li>
    <li><span>title 4</span> ": here are the details"</li>
  </ul>
</div>

我想要的是仅获取文本li所在的详细信息。我所做的是以下内容，它为我提供了第一个细节：spantitle 3li

puts page.at('.details').at('span', :text => "title 3").at("+ *").text

有没有办法使用 mechanize 做我想做的事，还是我也应该使用其他方法？

score 19 · Accepted Answer

page.search(".details").at("span:contains('title 3')").parent.text

说明：使用at您可以使用 css 或 xpath 选择器。为了使您的方法更具可读性和相似性，此答案使用 css 选择器，但问题是 CSS 无法基于文本执行选择。感谢 Nokogiri，您可以使用 JQuery 选择器，因此contains方法是允许的。

selection 获取 span 元素，所以如果要获取 li 元素的父级，可以使用parent方法，然后轻松获取文本。

score 2 · Accepted Answer

由于您希望使用 Mechanize 执行此操作（我看到其中一条评论建议使用 Nokogiri），您应该知道 Mechanize 是基于 Nokogiri 构建的，因此您实际上可以通过 Mechanize 使用任何/所有 Nokogiri 功能。

从http://mechanize.rubyforge.org/Mechanize.html的文档中向您展示

Mechanize.html_parser = Nokogiri::XML

因此，您可以使用 XPath 和 mechanize page.search 方法完成此操作。

page.search("//div[@class='details']/ul/li[span='title 3']").text

这应该能够为您提供您正在寻找的 li 元素的文本。（未经 .text 验证，但 XPath 确实有效）

您可以在此处测试 XPath：http ://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2

score 1 · Accepted Answer

1

更清洁的css方法：

page.at('.details li:has(span[text()="title 3"])')

于 2013-09-28T05:07:25.347 回答

score 0 · Accepted Answer

根据评论，我认为您正在寻找类似下面的内容。

正如我所说的问题是它给了我第一个 li 而我想要一个文本标题为 3

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-eotl
<div class="details">
  <ul>
    <li><span>title 1</span> ": here are the details"</li>
    <li><span>title 2</span> ": here are the details"</li>
    <li><span>title 3</span> ": here are the details"</li>
    <li><span>title 4</span> ": here are the details"</li>
  </ul>
</div>
eotl

node = doc.at_xpath("//div[@class='details']//span[contains(.,'title 3')]/..")
node.name # => "li"
puts node.to_html  
# <li>
# <span>title 3</span> ": here are the details"</li>
puts node.children
#<span>title 3</span>
# ": here are the details"

css - Ruby Mechanize 获取具有指定文本的元素

4 回答 4

Related

Reference