0

我有这样的事情:

<div id="sub_div">
    <span class="subl">
      <div class="node">2204830011</span>     
      <div class="node">1571827122</span>     
      <div class="node">...</span>    
      <div class="node">...</span>     
      <div class="node">...</span>      
    </span>
    <span class="subl">
      <div class="node">...</span>     
      <div class="node">...</span>     
      <div class="node">...</span>     
      <div class="node">...</span>     
      <div class="node">...</span>     
    </span>
    <span class="subl">
      <div class="node">...</span>     
      <div class="node">...</span>     
      <div class="node">...</span>     
   </span>

现在,我正在这样做:

  def self.parse_nodes

    id       = @data.at_css("#n_info #clipnode").text unless @data.at_css("#n_info #clipnode").nil?
    name     = @data.at_css("#n_info .node_name").text unless @data.at_css("#n_info .node_name").nil?
    parent   = @data.at_css(".bc a").text unless @data.at_css(".bc a").nil?

    children_array = []
    children = @data.css('#sub_div')
    children.css('.subl').each do | child |
      child_id = child.css('.node').text[/[\d,]+/].to_i
      children_array ||= []
      children_array << child_id
    end 

    nodes_hash = "id: #{id}, name: #{name}, parent: #{parent}, children: #{children_array}"
    nodes_hash
  end

我得到这样的东西:

[220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941,

220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941,
 220483001115718271223064201115857511158575013463330111571879115709231157103512258019011157197311570657115706941]

我不知道为什么代码会出现.node三次。但无论如何,我想做的事情是废弃.node每个.subldiv 中的内容并将它们呈现为一个数组:

[2204830011, 1571827122, 3064201115, 8575111585, 7501346333,
0111571879, 1157092311, 5710351225, 8019011157, 1973115706,
57115706941]

现场直播:http ://www.findbrowsenodes.com/us/Apparel/1036682

4

3 回答 3

1

您的代码产生以下输出:

require 'nokogiri'

html =<<END_OF_HTML
<div id="sub_div">
    <span class="subl">
      <div class="node">2204830011</div>
      <div class="node">1571827122</div>     
      <div class="node">...</div>    
      <div class="node">...</div>     
      <div class="node">...</div>      
    </span>

    <span class="subl">
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
    </span>
    <span class="subl">
      <div class="node">1</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
   </span>
</div>
END_OF_HTML

doc = Nokogiri::HTML(html)

children_array = []
children = doc.css('#sub_div')

children.css('.subl').each do | child |
  child_id = child.css('.node').text[/[\d,]+/].to_i
  children_array ||= []
  children_array << child_id
end 

p children_array

--output:--
[22048300111571827122, 0, 1]

您将数字连接在一起的原因是因为当您编写时:

child.css('.node')

...你会得到一个 NodeSet,其中包含 class="node" 的所有 div。text() 方法从 NodeSet 中提取所有文本节点,并将所有文本连接在一起,没有空格:

require 'nokogiri'

html = "<div><span>hello</span><span>world</span></div>"
doc = Nokogiri::HTML(html)

spans = doc.css("span")
puts spans.text

--output:--
helloworld

所以当你写:

child.css('.node').text

...您将许多数字连接在一起形成一个字符串。

这是您可以执行的操作:

require 'nokogiri'

html =<<END_OF_HTML
<div id="sub_div">
    <span class="subl">
      <div class="node">2204830011</div>
      <div class="node">1571827122</div>     
      <div class="node">...</div>    
      <div class="node">...</div>     
      <div class="node">...</div>      
    </span>

    <span class="subl">
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
    </span>
    <span class="subl">
      <div class="node">3333333</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
   </span>
</div>
END_OF_HTML


doc = Nokogiri::HTML(html)
results = []

doc.css("#sub_div span.subl div.node").each do |div|
  if num = div.text[/[\d,]+/] 
    results << num.to_i
  end
end

p results

--output:--
[2204830011, 1571827122, 3333333]
于 2013-09-07T06:48:26.677 回答
1

尝试以下操作:

children = @data.css('#sub_div')
children_array = children.css('.subl .node').map { |node| node.text.to_i }

或者

children = @data.css('#sub_div')
children_array = children.css('.subl .node').map(&:text).map(&:to_i)
于 2013-09-07T04:45:16.730 回答
0

这是另一种方法:-

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-eotl
<div id="sub_div">
    <span class="subl">
      <div class="node">2204830011</div>
      <div class="node">1571827122</div>     
      <div class="node">...</div>    
      <div class="node">...</div>     
      <div class="node">...</div>      
    </span>

    <span class="subl">
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
    </span>
    <span class="subl">
      <div class="node">3333333</div>     
      <div class="node">...</div>     
      <div class="node">...</div>     
   </span>
</div>
   eotl

doc.xpath("//div[@id='sub_div']//div[@class='node'][boolean(number()) or . = 0]").map{|n| n.text.to_i}
# => [2204830011, 1571827122, 3333333]
于 2013-09-07T07:56:39.363 回答