0

我正在尝试制作一个程序,该程序采用输入维基百科链接并单击第一个链接。程序将继续运行,直到它匹配第二个输入。我最终将添加功能以在程序遇到循环时终止程序。

现在,我的代码适用于只有几个链接的示例,例如 Bee -> History,但是对于较长的路径给我一个错误。这是代码,如果我昨天刚开始学习 ruby​​ 并且可能有错误,我将不胜感激。

require 'open-uri'
require 'nokogiri'

puts "Enter starting page (full URL not needed): "
page1 = gets.chomp

puts "Enter ending page (full URL not needed): "
page2 = gets.chomp

until page1 == page2 do
  #open page
  doc = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/" + page1))

  %w[.//table .//span .//sup .//i].map {|n| doc.xpath(n).map(&:remove) }

  #find href in first p
  fp = doc.css("p").first.search('a').map{ |a| a['href']}

  #make page1 = the end of the url. ex. /wiki/link = link
  page1 = fp.first[6,fp.first.length]
  puts page1
end

更新:这是我得到的错误:

C:\Users\files>ruby 121.rb
Enter starting page (full URL not needed):
Cow
Enter ending page (full URL not needed):
Philosophy
Domestication
Latin_(language)
Classical_antiquity
History
121.rb:20:in `<main>': undefined method `length' for nil:NilClass (NoMethodError
)
4

1 回答 1

1

此外,为了解决您的任务,您可以处理页面上的所有链接以实现 page2:

require 'open-uri'
require 'nokogiri'

puts "Enter starting page (full URL not needed): "
start_page = gets.chomp

puts "Enter ending page (full URL not needed): "
end_page = gets.chomp

pages = [start_page]
next_page = pages.first

until next_page == end_page or pages.empty? do
  next_page = pages.pop
  puts "Treat: #{next_page}"

  doc = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/" + next_page))

  %w[.//table .//span .//sup .//i].map {|n| doc.xpath(n).map(&:remove) }

 doc.css("p").each do |p| 
  p.search('a').each{ |a| pages.push a['href'][6, a['href'].length]}
 end

end
于 2013-06-10T06:07:38.010 回答