ruby-on-rails - 使用 Nokogiri 根据文本选择 HTML 块？

Question

我有以下 HTML 块：

<tr>
   <th>Consignment Service Code</th>
   <td>ND16</td>
</tr>

我最终要拉的是那个ND16字符串，但要做到这一点，我需要<tr>根据 text选择Consignment Service Code。

我已经在使用 Nokogiri 来解析 HTML，所以继续使用它会很棒。

那么，如何根据文本“ Consignment Service Code”选择该 HTML 块？

score 2 · Accepted Answer

你可以这样做：

require 'nokogiri'

doc=Nokogiri::HTML::parse <<-eot
<tr>
   <th>Consignment Service Code</th>
   <td>ND16</td>
</tr>
eot

node = doc.at_xpath("//*[text()='Consignment Service Code']/following-sibling::*[1]")
puts node.text
# >> ND16

这是一个额外的尝试，它可能会帮助你开始：

## parent node
parent_node = doc.at_xpath("//*[text()='Consignment Service Code']/..")
puts parent_node.name # => tr

## to get the child td
puts parent_node.at_xpath("//td").text # => ND16

puts parent_node.to_html

#<tr>
#<th>Consignment Service Code</th>
#   <td>ND16</td>
#</tr>

score 1 · Accepted Answer

还有一种方式。

使用 Nokogiri 的css方法找到合适的tr节点，然后选择在th标签中具有所需文本的节点。最后，使用选定的节点并提取td值：

require 'nokogiri'

str = '<tr>
   <th>Consignment</th>
   <td>ND15</td>
</tr>
<tr>
   <th>Consignment Service Code</th>
   <td>ND16</td>
</tr>
<tr>
   <th>Consignment Service Code</th>
   <td>ND17</td>
</tr>'

doc = Nokogiri::HTML.parse(str)
nodes = doc.css('tr')
           .select{|el| 
             el.css('th').text =~ /^Consignment Service Code$/
           }

nodes.each do |el|
  p el.css('td').text
end

输出是：

"ND16"
"ND17"

ruby-on-rails - 使用 Nokogiri 根据文本选择 HTML 块？

2 回答 2

Related

Reference