1

我正在抓取网页的文本,使用如下代码

doc.xpath("//td[text()='Operating system']/following-sibling::td")
doc.xpath("//td[text()='Processors']/following-sibling::td")

我有大约 30 个,所以我想我可以使用一个数组,但它不起作用,这是我的代码

clues = Array.new
clues << 'Operating system'
clues << 'Processors'
clues << 'Chipset'

clues.each do |clue_storeage|
doc.xpath("//td[text()=#{clues}]/following-sibling::td")
end

有没有办法可以将数组输入到该循环中,然后将其输出到 CSV?

4

1 回答 1

0

为了澄清 mb2nd 的评论,您的每个块都错误地引用了数组。这应该有效:

clues.each do |clue|
  doc.xpath("//td[text()=#{clue}]/following-sibling::td")
end

要将捕获的数据输出到 CSV,您可以运行:

csv = ""
clues.each do |clue|
  csv << doc.xpath("//td[text()=#{clue}]/following-sibling::td")
  csv << ", " unless clues.last == clue
end

doc.xpath("//td[text()=#{clue}]/following-sibling::td") 调用最后可能需要 .value 吗?

附带说明;你也可以像这样填充你的数组:

clues = ['Operating system', 'Processors', 'Chipset']  

编辑(在@Ninja2K 的最后评论之后)

您需要保存每个 xpath 调用的结果。这是一些工作代码:

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

doc = Nokogiri::HTML(open("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1%22"))

clues = ['Operating system', 'Processors', 'Chipset'] 

csv_text = ""
clues.each do |clue|
  csv_text << doc.at_xpath("//td[text()='#{clue}']/following-sibling::td").text
  csv_text << ", " unless clues.last == clue
end
puts csv_text

顺便提一句。您可能还会发现这篇文章很有用:http: //hunterpowers.com/data-scraping-and-more-with-ruby-nokogiri-sinatra-and-heroku/

于 2012-07-17T14:46:29.390 回答