ruby-on-rails - 使用 Tempfile 两次？

Question

我在一个简单的程序上遇到了我认为与 Tempfiles 相关的问题。我正在使用“open-uri”和“nokogiri”，并且正在尝试对文档进行正则表达式搜索以及使用 nokogiri 进行 xpath 搜索。但是，如果不对文档提出两个单独的请求并因此创建两个单独的临时文件，我似乎无法做到这一点。这有效，但提出了两个请求：

require 'open-uri'
require 'nokogiri'

source_url = "http://foo.com/"
#grab html document and assign it a variable
doc = open(source_url)
#grab html document, convert to Nokogiri object and assign to variable.
noko_doc = Nokogiri::HTML(open(source_url))

#create array of stuff. 
foo = noko_doc.xpath("//some element").collect { |e| e }
#create another array of stuff
bar = []
doc.each do |f|
    f.each do |line|
        abstract_matches = line.scan(/some regex string/)                                  
        unless abstract_matches.empty?
            abstract_matches.collect! do |item|
                if item.to_s.match(/yet another regex string/) 
                    item
                end
            end.compact!
            unless abstract_matches.empty?
                abstract_matches.each { |match| bar << "#{ match } / " }
            end
        end
    end
end
#all for this
puts foo + bar

如果我可以将“doc”变量传递到 Nokogiri::HTML 并对其进行迭代，我会更喜欢。帮助？

score 2 · Accepted Answer

迭代 Tempfile 并不常见。更常见的是这样访问：

html = open(source_url).read
noko_doc = Nokogiri::HTML(html)
html.split("\n").each do |line|
  # do stuff
end

score 1 · Accepted Answer

您可以从字符串中解析 HTML，请参阅教程。

难道你不能直接放入doc一个字符串并让 Nokogiri 从中解析吗？

ruby-on-rails - 使用 Tempfile 两次？

2 回答 2

Related

Reference