2

我正在尝试从http://expo.getbootstrap.com/

HTML是这样的:

<div class="col-span-4">
  <p>
    <a class="thumbnail" target="_blank" href="https://www.getsentry.com/">
      <img src="/screenshots/sentry.jpg">
    </a>
  </p>
</div>

我的 Nokogiri 代码是:

url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
  title=site.css("h4 a").text
  href = site.css("a.thumbnail")[0]['href']
end  

目标很简单,获取href<img>标签href和网站<title>,但它会不断报告:

undefined method [] for nil:NilClass 

在行中:

href = site.css("a.thumbnail")[0]['href']

这真的让我发疯,因为我在这里编写的代码实际上是在另一种情况下工作的。

4

2 回答 2

2

我会做类似的事情:

require 'nokogiri'
require 'open-uri'
require 'pp'

doc = Nokogiri::HTML(open('http://expo.getbootstrap.com/'))

thumbnails = doc.search('a.thumbnail').map{ |thumbnail|
  {
    href: thumbnail['href'],
    src: thumbnail.at('img')['src'],
    title: thumbnail.parent.parent.at('h4 a').text
  }
}

pp thumbnails

其中,运行后有:

# => [
  {
    :href => "https://www.getsentry.com/",
    :src => "/screenshots/sentry.jpg",
    :title => "Sentry"
  },
  {
    :href => "http://laravel.com",
    :src => "/screenshots/laravel.jpg",
    :title => "Laravel"
  },
  {
    :href => "http://gruntjs.com",
    :src => "/screenshots/gruntjs.jpg",
    :title => "Grunt"
  },
  {
    :href => "http://labs.bittorrent.com",
    :src => "/screenshots/bittorrent-labs.jpg",
    :title => "BitTorrent Labs"
  },
  {
    :href => "https://www.easybring.com/en",
    :src => "/screenshots/easybring.jpg",
    :title => "Easybring"
  },
  {
    :href => "http://developers.kippt.com/",
    :src => "/screenshots/kippt-developers.jpg",
    :title => "Kippt Developers"
  },
  {
    :href => "http://www.learndot.com/",
    :src => "/screenshots/learndot.jpg",
    :title => "Learndot"
  },
  {
    :href=>"http://getflywheel.com/",
    :src=>"/screenshots/flywheel.jpg",
    :title=>"Flywheel"
}
]
于 2013-05-28T14:28:31.853 回答
1

您没有考虑到并非所有.col-span-4div 都包含缩略图的事实。这应该有效:

url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
  title = site.css("h4 a").text
  thumbnail = site.css("a.thumbnail")
  next if thumbnail.empty?
  href = thumbnail[0]['href']
end
于 2013-05-28T14:02:59.387 回答