1

我在我的 Rails 3.2.13 应用程序上运行 DelayedJob 和 Mechanize - 尽管这两件事可能不是这个问题的一部分。我的应用程序运行良好,直到一周前。现在,当我使用延迟方法在生产环境中运行我的 Rails 3.2.13 应用程序时scrape,它会报告:

Jul 03 13:57:19 myapp app/worker

.1:     (30.7ms)  SELECT COUNT(*) AS count_all, priority AS priority FROM "delayed_jobs" WHERE (run_at < '2013-07-03 20:57:19.344207' and failed_at is NULL) GROUP BY priority` 

`Jul 03 13:58:01 myapp app/worker.1:  [Worker(host:1d2342a-b234f-4342-bcd7-0afsji3e60dab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1:  2013-07-03T20:58:01+0000: [Worker(host:1d30481a-cf4f-4344-bad7-0e5e0ae60cab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1:     (3.1ms)  BEGIN 
Jul 03 13:58:01 myapp app/worker.1:     (18.3ms)  UPDATE "delayed_jobs" SET "last_error" = 'U+03B1 from UTF-8 to ISO-8859-1 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode_to'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:43:in `from_native_charset'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/form.rb:243:in `from_native_charset'' 

该代码U+03B1对应于字母 alpha ( α)。但是,我永远不会在我的代码中写一个 alpha,因为我没有用它。我认为这可能与我的 Twitter Bootstrap 安装有关,但我刚刚卸载了它,问题并没有消失。

这是它正在处理的对象。我注意到在我尝试过的所有条目中,属性前总是有一个感叹号mylist。我不确定为什么。

19:51:06 web.1 | SQL (0.8ms) INSERT INTO "delayed_jobs" ("attempts", "created_at", "failed_at", "handler", "last_error", "locked_at", "locked_by", "priority", "queue", "run_at", "updated_at") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["attempts", 0], ["created_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["failed_at", nil], ["handler", "--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/ActiveRecord:Doilist\n attributes:\n id: 188\n mylist: ! \"VWQ.U9JF.45.4595.pdf\\r\\nVWQ.U9JF.45.4595.xml\\r\\n \\r\\nVWQ.U9JF.46.1558.pdf\\r\\nVWQ.U9JF.46.1558.xml\\r\\n\n \\r\\nVWQ.U9JF.421234.pdf\\r\\nVWQ.U9JF.461764.xml\\r\\n \\r\\nVWQ.U9JF.434147.pdf\"\n created_at: 2013-07-03 23:51:06.694626000 Z\n updated_at: 2013-07-03 23:51:06.694626000 Z\n myuserid: myemail@email.com\n mypass: mypassword\n mymonth: '7'\n mydate: '1'\n myyear: '1'\nmethod_name: :scrape\nargs: []\n"], ["last_error", nil], ["locked_at", nil], ["locked_by", nil], ["priority", 0], ["queue", nil], ["run_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["updated_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00]]

我很乐意应要求发布任何文件,因为我猜这是一个复杂的问题。非常感谢您的意见。

4

1 回答 1

0

感谢 Domon 的建议,我解决了这个问题。Mechanize 正在与之交互的网页采用 ISO-8859-1 格式(请参阅有关如何在此处检测的更多信息),而我的系统正在尝试以 UTF-8 格式读取页面。为了解决这个问题,我输入agent.page.encoding = 'utf-8'了我scrape的方法脚本,如 Niels Kristian 的答案所示。(另请参阅 denis.peplin 的回答,以进一步说明在何处编写它。)这允许将网页强制转换为我的系统读取它所需的正确格式(UTF-8)。

于 2013-07-09T21:14:05.390 回答