-1

所以我在一个数组元素中循环,这是返回的结果:

[nil, [#<Nokogiri::XML::Element:0x835386d4 name="a" attributes=[#<Nokogiri::XML::Attr:0x835385f8 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x835381c0 "Web Designer Full time">]>

我想做的是访问href价值,然后是text价值。我怎么做?

我试过这个:

puts i[:href]

但这会产生此错误:

TypeError: Symbol as array index

顺便说一句,我i通过这样的每个元素作为数组中的元素访问:

contents.each do |i|
    puts i.inspect
    puts i[:href]
end

编辑1:

这就是我生成contents数组的方式。没有必要重命名它,因为它会让人困惑:)

contents = {}
first_items.each do |link|
    content_url = link
    content_page = Nokogiri::HTML(open(content_url))
    contents[link[:href]] = content_page.css("p a")
end

puts contents.inspect

这是得到的输出:

{nil=>[#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]}

这是输出的完整值i

--------------------
This is the value of i: 
[nil, [#<Nokogiri::XML::Element:0x85fee914 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee838 name="href" value="http://bham.craigslist.org/web/2961573018.html">] children=[#<Nokogiri::XML::Text:0x85fee400 "Web Designer Full time">]>, #<Nokogiri::XML::Element:0x85fee298 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fee1bc name="href" value="http://bham.craigslist.org/web/2959813303.html">] children=[#<Nokogiri::XML::Text:0x85fedd84 "Once in a lifetime opportunity...">]>, #<Nokogiri::XML::Element:0x85fedc1c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fedb40 name="href" value="http://bham.craigslist.org/web/2925485723.html">] children=[#<Nokogiri::XML::Text:0x85fed708 "Website Designer and Blogging Internship!">]>, #<Nokogiri::XML::Element:0x85fed5a0 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fed4c4 name="href" value="http://bham.craigslist.org/web/2918424652.html">] children=[#<Nokogiri::XML::Text:0x85fed08c "Excellent Java Developer Opportunity!">]>, #<Nokogiri::XML::Element:0x85fecf24 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fece48 name="href" value="http://bham.craigslist.org/web/2888669703.html">] children=[#<Nokogiri::XML::Text:0x85feca10 "Freelance Graphic Design">]>, #<Nokogiri::XML::Element:0x85fec8a8 name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec7cc name="href" value="http://bham.craigslist.org/web/2900256461.html">] children=[#<Nokogiri::XML::Text:0x85fec394 "GWT/GXT Developer">]>, #<Nokogiri::XML::Element:0x85fec22c name="a" attributes=[#<Nokogiri::XML::Attr:0x85fec150 name="href" value="http://bham.craigslist.org/web/2897641463.html">] children=[#<Nokogiri::XML::Text:0x85febd18 "Website hiring!">]>]]
--------------------
This is the value of i.href: 

编辑2:

顺便说一句,这就是实际的 HTML 输出的样子……我这样做了:

builder = Nokogiri::HTML::Builder.new do |doc|
    doc.html {
        doc.body {
            contents.each do |el|
                if !el.nil?
                    puts "-" * 20
                    puts "This is the value of el: "
                puts el.inspect

                    puts "-" * 20
                    puts "This is the value of el.href: "           
                 puts el[:href]
                end

                doc.p {
                    doc.a el, :href => el
                    } 
            end     
            }           
        }
end

puts "*" * 50
puts "This is the HTML generated"

puts builder.to_html

这是它的外观:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p><a href="&lt;a%20href=%22http://bham.craigslist.org/web/2961573018.html%22&gt;Web%20Designer%20Full%20time&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2959813303.html%22&gt;Once%20in%20a%20lifetime%20opportunity...&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2925485723.html%22&gt;Website%20Designer%20and%20Blogging%20Internship!&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2918424652.html%22&gt;Excellent%20Java%20Developer%20Opportunity!&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2888669703.html%22&gt;Freelance%20Graphic%20Design&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2900256461.html%22&gt;GWT/GXT%20Developer&lt;/a&gt;&lt;a%20href=%22http://bham.craigslist.org/web/2897641463.html%22&gt;Website%20hiring!&lt;/a&gt;">&lt;a href="http://bham.craigslist.org/web/2961573018.html"&gt;Web Designer Full time&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2959813303.html"&gt;Once in a lifetime opportunity...&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2925485723.html"&gt;Website Designer and Blogging Internship!&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2918424652.html"&gt;Excellent Java Developer Opportunity!&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2888669703.html"&gt;Freelance Graphic Design&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2900256461.html"&gt;GWT/GXT Developer&lt;/a&gt;&lt;a href="http://bham.craigslist.org/web/2897641463.html"&gt;Website hiring!&lt;/a&gt;</a></p></body></html>
4

3 回答 3

1

我认为它可以简单得多。Nokogiri 已经解析了文档并提供了访问内容的便捷方式。与其循环、存储 Nokogiri 对象,然后尝试提取它们,为什么不尝试更直接的方法呢?

试试这个代码:

content_page.search(//a[@href]).map{ |el| [el[:href], el.text] }

这将为文档中的每个链接创建包含文本和 href 的 2d 数组,这就是您在实际正在努力的后续评论中所说的。

于 2012-04-24T11:49:00.243 回答
0

您可以使用 compact 删除 nil:

nodes.compact.each do |node|
  puts node[:href], node.text
end
于 2012-04-24T00:40:31.437 回答
0

也许是这样,因为你的数组中有一个奇怪的 nil 。

contents.each do |i|
  if !i.nil?
    puts i.inspect
    puts i[:href]
  end
end

Edit1:其实我认为你只需要这样做contents = contents[1]

contents = contents[1]
contents.each do |i|
    puts i.inspect
    puts i[:href]
end
于 2012-04-24T00:30:02.513 回答