2

我有一个 HTML 页面:

<li id="user_432232" class="profile ">
  <section class="vcard clearfix">
    <div class="text">
      <div class="name">
      <h2 class="n fn">
        <a href="#" class="profile-link">Johww</a>
      </h2>

<div class="like-action like-action-user-432232">
  <div class="like" style=";">
    <span class="like-number" title="25 people like Jose">25</span>
  </div>
</div>
    </div>
      <p class="title">SCR</p>
    </div>
  </section>
</li>
<li id="user_432232" class="profile ">
  <section class="vcard clearfix">
    <div class="text">
      <div class="name">
      <h2 class="n fn">
        <a href="#" class="profile-link">Jose </a>
      </h2>

<div class="like-action like-action-user-432232">
  <div class="like" style=";">
    <span class="like-number" title="25 people like Jose">25</span>
  </div>
</div>
    </div>
      <p class="title">SCRT</p>
    </div>
  </section>
</li>

我需要抓取诸如nameand titlelike等内容:

def find_page_data(url)
  doc = Nokogiri::HTML(open(html))
  data = [] 
  doc.css('.profile').each do |item|
    name= item.at_css("n fn").text
    like_no = item.at_css(".like-number").text
    title = item.css("p")[0].text
    data << [name,title,like_no]
  end
  data
end

我在doc.css('.profile')返回一个空白数组时将数据变为空白,因为class="profile "它以空格结尾,所以我无法得到它。

4

1 回答 1

2

参数内部的空格class是预期的并且可以正常工作:

require 'nokogiri'

html = <<EOT
<html>
  <body>
    <p class="foo ">found foo</p>
    <p class="foo bar">found bar</p>
  </body>
</html>
EOT

doc = Nokogiri::HTML(html)
doc.at('.foo').to_html # => "<p class=\"foo \">found foo</p>"
doc.search('.foo').to_html # => "<p class=\"foo \">found foo</p><p class=\"foo bar\">found bar</p>"
doc.at('.bar').to_html # => "<p class=\"foo bar\">found bar</p>"

请注意 Nokogiri.foo在前两次检查中是如何发现的,这是应该的,.bar在最后一次检查中是如何发现的。

所有类都包含一个嵌入空间。

于 2013-10-11T16:37:16.960 回答