java - 使用 JavaMail 阅读电子邮件内容时出现编码问题

Question

我正在使用 JavaMail 1.4.1（我已升级到 1.4.5 版本但遇到同样的问题）从电子邮件帐户中读取消息，但我遇到了内容编码问题：

POP3Message pop3message;
... 
Object contentObject = pop3message.getContent();
...   
String contentType = pop3message.getContentType();
String content = contentObject.toString();

有些消息可以正确读取，但其他消息由于编码不合适而具有奇怪的字符。我已经意识到它不适用于特定的内容类型。

如果 contentType 是以下任何一种，则效果很好：

文本/纯文本；字符集=ISO-8859-1

文本/纯文本；
字符集="iso-8859-1"

文本/纯文本；
字符集="ISO-8859-1";
格式="流动"

文本/纯文本；字符集=windows-1252

但如果它不是：

文本/纯文本；
字符集="utf-8"

对于这个 contentType (UTF-8 one)，如果我尝试获取编码 (pop3message.getEncoding()) 我得到

引用可打印

例如，对于后一种编码，我在调试器中得到 String 值（与持久化对象后在数据库中看到的方式相同）：

Ubicación（而不是 Ubicación）

但是，如果我在浏览器中使用电子邮件客户端打开电子邮件，则可以毫无问题地阅读它，并且这是一条普通的消息（没有附件，只有文本），因此该消息似乎没问题。

关于如何解决这个问题的任何想法？

谢谢。

更新这是我添加的一段代码，用于尝试 jlordo 提供的函数 getUTF8Content()

POP3Message pop3message = (POP3Message) message;
String uid = pop3folder.getUID(message);

//START JUST FOR TESTING PURPOSES
if(uid.trim().equals("1401")){
    Object utfContent = pop3message.getContent();
    System.out.println(utfContent.getClass().getName()); // it is of type String
    //System.out.println(utfContent); // if not commmented it prints the content of one of the emails I'm having problems with.
    System.out.println(pop3message.getEncoding()); //prints: quoted-printable
    System.out.println(pop3message.getContentType()); //prints: text/plain; charset="utf-8"
    String utfContentString = getUTF8Content(utfContent); // throws java.lang.ClassCastException: java.lang.String cannot be cast to javax.mail.util.SharedByteArrayInputStream
    System.out.println(utfContentString);
}

//END TEST CODE

score 1 · Accepted Answer

您如何检测到这些消息具有“奇怪的字符”？您是否在某处显示数据？您用于显示数据的任何方法都可能无法正确处理 Unicode 字符。

第一步是确定问题是您得到了错误的字符，还是正确的字符显示不正确。您可以检查数据中每个字符的 Unicode 值（例如，在 getContent 方法返回的字符串中），以确保每个字符都具有正确的 Unicode 值。如果是这样，则问题出在您用于显示字符的方法上。

score 0 · Accepted Answer

试试这个，让我知道它是否有效：

if ( *check if utf 8 here* ) {
    content = getUTF8Content(contentObject);
}

// TODO take care of UnsupportedEncodingException, 
// IOException and ClassCastException
public static String getUTF8Content(Object contentObject) {
    // possible ClassCastException
    SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) contentObject;
    // possible UnsupportedEncodingException
    InputStreamReader isr = new InputStreamReader(sbais, Charset.forName("UTF-8"));
    int charsRead = 0;
    StringBuilder content = new StringBuilder();
    int bufferSize = 1024;
    char[] buffer = new char[bufferSize];
    // possible IOException
    while ((charsRead = isr.read(buffer)) != -1) {
        content.append(Arrays.copyOf(buffer, charsRead));
    }
    return content.toString();
}

顺便说一句，JavaMail 1.4.1 是必需的吗？最新版本是 1.4.5。

score 0 · Accepted Answer

对我有用的是我打电话getContentType()来检查字符串中是否包含“utf”（定义用作UTF之一的字符集）。

如果是的话，在这种情况下，我会以不同的方式对待内容。

private String encodeCorrectly(InputStream is) {
    java.util.Scanner s = new java.util.Scanner(is, StandardCharsets.UTF_8.toString()).useDelimiter("\\A");
    return s.hasNext() ? s.next() : "";
}

（从这个答案对一个 IS 到字符串转换器的修改SO ）

这里的重要部分是使用正确的字符集。这为我解决了这个问题。

score 0 · Accepted Answer

首先，您必须以这种方式根据 UTF-8 编码添加标头：

...
MimeMessage msg = new MimeMessage(session);
msg.setHeader("Content-Type", "text/html; charset=UTF-8");
msg.setHeader("Content-Transfer-Encoding", "8bit");

msg.setFrom(new InternetAddress(doConversion(from)));
msg.setRecipients(javax.mail.Message.RecipientType.TO, address);
msg.setSubject(asunto, "UTF-8");

MimeBodyPart mbp1 = new MimeBodyPart();
mbp1.setContent(text, "text/html; charset=UTF-8");
Multipart mp = new MimeMultipart();
mp.addBodyPart(mbp1);
...

但是对于'from'标题，我使用以下方法来转换字符：

public String doConversion(String original) {
    if(original == null) return null;
    String converted = original.replaceAll("á", "\u00c3\u00a1");
    converted = converted.replaceAll("Á", "\u00c3\u0081");
    converted = converted.replaceAll("é", "\u00c3\u00a9");
    converted = converted.replaceAll("É", "\u00c3\u0089");
    converted = converted.replaceAll("í", "\u00c3\u00ad");
    converted = converted.replaceAll("Í", "\u00c3\u008d");
    converted = converted.replaceAll("ó", "\u00c3\u00b3");
    converted = converted.replaceAll("Ó", "\u00c3\u0093");
    converted = converted.replaceAll("ú", "\u00c3\u00ba");
    converted = converted.replaceAll("Ú", "\u00c3\u009a");
    converted = converted.replaceAll("ñ", "\u00c3\u00b1");
    converted = converted.replaceAll("Ñ", "\u00c3\u0091");
    converted = converted.replaceAll("€", "\u00c2\u0080");
    converted = converted.replaceAll("¿", "\u00c2\u00bf");
    converted = converted.replaceAll("ª", "\u00c2\u00aa");
    converted = converted.replaceAll("º", "\u00c2\u00b0");
    return converted;
}

如果您需要包含一些其他字符，您可以在http://www.fileformat.info/info/charset/UTF-8/list.htm看到相应的 UTF-8 十六进制编码。

java - 使用 JavaMail 阅读电子邮件内容时出现编码问题

4 回答 4

Related

Reference