ruby-on-rails - 黄瓜将 pdf 读入临时文件

Question

我已经设置了一个黄瓜套件来读取静态 PDF 文件并对其内容进行断言。

我最近更新了我所有的宝石，自从这样做之后，它就不再起作用了。

黄瓜步骤如下：

When /^I follow PDF link "([^"]*)"$/ do |arg1|
  temp_pdf = Tempfile.new('foo')
  temp_pdf << page.body
  temp_pdf.close
  temp_txt = Tempfile.new('txt')
  temp_txt.close
  'pdftotext -q #{temp_pdf.path} #{temp_txt.path}'
  page.drive.instance_variable_set('@body', File.read(temp_txt.path))
end

这曾经工作得很好。但是在更新到 Lion/my gems 后，执行该行时会引发以下错误temp_pdf << page.body

encoding error: output conversion failed due to conv error, bytes 0xA3 0xC3 0x8F 0xC3
I/O error : encoder error

我尝试了一些来自不同来源的不同 PDF，但它们似乎都失败了。如何将 PDF 读入临时文件？

score 4 · Accepted Answer

以下代码对我有用。必须将 temp_pdf << page.body 更改为 page.source（因为 body 已被解析错误）。我还必须在驱动程序浏览器上设置实例变量@dom，而不是在驱动程序上设置@body。这是因为在最近的水豚版本 (rack_test) 驱动程序中不存在实例变量主体，而是主体调用'@browser.body'：

https://github.com/jnicklas/capybara/blob/master/lib/capybara/rack_test/driver.rb

browser.body 再次调用“dom.to_xml”，如果您查看“dom”，您会看到它使用 Nokogiri::HTML 初始化 @dom，因此很明显在第一名。

https://github.com/jnicklas/capybara/blob/master/lib/capybara/rack_test/browser.rb

with_scope(selector) do
  click_link(label)
  temp_pdf = Tempfile.new('pdf')
  temp_pdf << page.source
  temp_pdf.close
  temp_txt = Tempfile.new('txt')
  temp_txt.close
  temp_txt_path = "#{temp_txt.path}.html"
  `pdftohtml -c -noframes #{temp_pdf.path} #{temp_txt_path}`
  page.driver.browser.instance_variable_set('@dom', Nokogiri::HTML(File.read(temp_txt_path))
end

ruby-on-rails - 黄瓜将 pdf 读入临时文件

1 回答 1

Related

Reference