使用mail
红宝石我收到这条消息:
mail.rb:22:in `encode': "\xC7" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
from mail.rb:22:in `<main>'
如果我删除编码,我会收到一条消息 ruby
/var/lib/gems/1.9.1/gems/bson-1.7.0/lib/bson/bson_ruby.rb:63:in `rescue in to_utf8_binary': String not valid utf-8: "<div dir=\"ltr\"><div class=\"gmail_quote\">l<br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div dir=\"rtl\">\xC7\xE1\xE4\xD5 \xC8\xC7\xE1\xE1\xDB\xC9 \xC7\xE1\xDA\xD1\xC8\xED\xC9</div></div>\r\n</div><br></div>\r\n</div><br></div>\r\n</div><br></div>" (BSON::InvalidStringEncoding)
这是我的代码:
require 'mail'
require 'mongo'
connection = Mongo::Connection.new
db = connection.db("DB")
db = Mongo::Connection.new.db("DB")
newsCollection = db["news"]
Mail.defaults do
retriever_method :pop3, :address => "pop.gmail.com",
:port => 995,
:user_name => 'my_username',
:password => '*****',
:enable_ssl => true
end
emails = Mail.last
#Checks if email is multipart and decods accordingly. Put to extract UTF8 from body
plain_part = emails.multipart? ? (emails.text_part ? emails.text_part.body.decoded : nil) : emails.body.decoded
html_part = emails.html_part ? emails.html_part.body.decoded : nil
mongoMessage = {"date" => emails.date.to_s , "subject" => emails.subject , "body" => plain_part.encode('UTF-8') }
msgID = newsCollection.insert(mongoMessage) #add the document to the database and returns it's ID
puts msgID
对于英语和希伯来语,它工作得很好,但似乎 gmail 正在发送具有不同编码的阿拉伯语。替换UTF-8
为ASCII-8BIT
给出类似的错误。
plain_part
用于普通电子邮件时,我得到相同的结果。我正在处理来自一个特定来源的电子邮件,因此我可以确信 html_part 不会导致错误。为了使它更加奇怪,阿拉伯语中的主题被完美呈现。我应该使用什么编码?