java - Grails/Spring 集成应用程序中的意外 Unicode 翻译

Question

我有一个 Grails 应用程序，其中包含一个基于 Java 的 Spring Integration 驱动的电子邮件适配器。电子邮件适配器处理来自单一来源的电子邮件，并根据业务规则，通过更新一些内部表格（包括将电子邮件的 HTML 正文添加到 OracleCLOB以供参考）将某些通信报告回用户。

大约有一半的时间，HTML 中的链接在添加到 CLOB 时会损坏。例如，“ =df ”被解释为 Unicode U+00DF，并转换为“ ß ”（拉丁小写字母 SHARP S），“ =20 ”被转换为空格。这两个意外的映射都会破坏链接。

http://www.mycompany.com/MyProject/MyApp.xxx?field1=dfa1.0&field2=2.0&field3=20012345&field4=N

http://www.mycompany.com/MyProject/MyApp.xxx?field1ßa1.0&field2=2.0&field3 012345&field4=N

这种损坏并非一直都在发生，我也无法确定它何时发生的模式。

这是唯一“接触”电子邮件中 HTML 内容的代码......

public void processMessage(Message<?> message) {
    if (message.getPayload() instanceof MimeMessage) {
        MimeMessage mimeMessage = (MimeMessage) message.getPayload();
        try {
            String subject = mimeMessage.getSubject();

            logger.info("Subject : " + subject);

            // Get the main body of the message -- Assumes the email is in HTML format and
            // uses that to isolate the interesting bits of the email to analyze
            String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));
            logger.info("Content Length (bytes) : " + content.length());
            int htmlStart = content.indexOf(HTML_START);
            int htmlEnd = content.lastIndexOf(HTML_END);
            String html;
            try {
                html = content.substring(htmlStart, htmlEnd + HTML_END.length());
            } catch (IndexOutOfBoundsException e) {
                // Don't try and prune the string
                html = content;
            }

            // Do the major processing of the actual HTML contents. This is where the magic happens.
            processHtmlMessageContent(html);

        } catch (MessagingException e) {
            logger.error("Error in processing message:", e);
        } catch (IOException e) {
            logger.error("Error in processing message:", e);
        }
    } else {
        logger.error("DON'T KNOW HOW TO PROCESS [" + message.getPayload().getClass() + "] MESSAGE");
    }
    logger.info("Done.");
}

我怀疑问题出在convertStreamToStringor中MimeUtility.decode，但我无法隔离它。当它存储在 a 中时，我也不排除一些奇怪之处CLOB，但我发现这不太可能。

作为参考，我的convertStreamToString()方法是...

protected String convertStreamToString(java.io.InputStream is) {
    try {
        return new java.util.Scanner(is).useDelimiter("\\A").next();
    } catch (java.util.NoSuchElementException e) {
        return "";
    }
}

我试过换...

String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));

至...

String content = convertStreamToString(mimeMessage.getDataHandler().getDataSource().getInputStream());

但现在我失去了基本的 mime 解码。

我还尝试使用 MimeUtility 来获取编码

String encoding = MimeUtility.getEncoding(mimeMessage.getDataHandler().getDataSource());

这会返回7bit，我已经尝试过使用它，但后来我得到=3D了等号之类的东西。

在解码的内容中，我得到以下内容，这表明quoted-printable

Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

我已经浏览了 javadocs、源代码和在线示例，但这对我来说真的没有用。

score 0 · Accepted Answer

您看到的转换完全是引用可打印的解码，所以我怀疑您正在尝试解码最初不是 QP 编码的数据。您可能应该检查的标头来mimeMessage决定您需要做什么解码，而不是无条件地执行 QP。

java - Grails/Spring 集成应用程序中的意外 Unicode 翻译

1 回答 1

Related

Reference