3

I'm having some trouble with email encoding. I am reading an HTML file from disk and sending in through Gmail. When I open the HTML in the browser it looks great. When I copy the HTML string from Visual Studio and save it as an HTML file, it looks great. When I receive the email it contains a bunch of invalid characters. Even the list bullets are messed up! I'm sure this is an issue with encoding, but the file is encoded as UTF-8 and looks good until it's converted to RAW and sent through Gmail.

Here is the process. We read from a docx using the OpenXML SDK then we use the HtmlConverter to save the document as HTML. Later the HTML is read in from the file, converted to RAW formatting and sent through the GMail API.

Here are some relevant code snips:

This is where we save our HTML file using HtmlConverter.

HtmlConverterSettings settings = new HtmlConverterSettings()
{
    AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
    FabricateCssClasses = true,
    RestrictToSupportedLanguages = false,
    RestrictToSupportedNumberingFormats = false,
};

XElement htmlElement = HtmlConverter.ConvertToHtml( wdWordDocument, settings );
var html = new XDocument(
    new XDocumentType( "html", null, null, null ),
    htmlElement );

var htmlString = html.ToString( SaveOptions.DisableFormatting );
File.WriteAllText( destFileName.FullName, htmlString, Encoding.UTF8 );

This is where we read the stored HTMl and convert it for sending via Gmail. (We use Mimekit for the conversion.)

// Create the message using MimeKit/System.Net.Mail.MailMessage
MailMessage msg = new MailMessage();
msg.Subject = strEmailSubject; // Subject
msg.From = new MailAddress( strUserEmail ); // Sender
msg.To.Add( new MailAddress( row.email ) ); // Recipient
msg.BodyEncoding = Encoding.UTF8;
msg.IsBodyHtml = true; 

// We need to loop through our HTML Document and replace the images with a CID so that they will display inline
var vHtmlDoc = new HtmlAgilityPack.HtmlDocument();
vHtmlDoc.Load( row.file ); // Read the body, from HTML file
...
msg.Body = vHtmlDoc.DocumentNode.OuterHtml;

// Convert our System.Net.Mail.MailMessage to RAW with Base64 encoding for Gmail
MimeMessage mimeMessage = MimeMessage.CreateFromMailMessage( msg );

Google.Apis.Gmail.v1.Data.Message message = new Google.Apis.Gmail.v1.Data.Message();
message.Raw = Base64UrlEncode( mimeMessage.ToString() );
var result = vGMailService.Users.Messages.Send( message, "me" ).Execute();

And this is how we are base64 encoding:

private static string Base64UrlEncode( string input )
{
var inputBytes = System.Text.Encoding.UTF8.GetBytes( input );
// Special "url-safe" base64 encode.
return Convert.ToBase64String( inputBytes )
                  .Replace( '+', '-' )
                  .Replace( '/', '_' )
                  .Replace( "=", "" );
}

The email ends up as "Content-Type: multipart/mixed" with two alternatives. One is

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

and the other is

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The both the plain text and the HTML contain strings like =C3=A2=E2=82=AC=E2=84=A2 for an apostrophe and the HTML portion contains an HTML header that contains weird "3D" characters in it.

<meta charset=3D"UTF-8"><title></title><meta name=3D"Generator"=
 content=3D"PowerTools for Open XML">

None of this weirdness was in the HTML prior to converting to Base64 and sending.

Any ideas what the problem could be? Does this have anything to do with UTF8 and Mimekit?

4

2 回答 2

0

这是您的代码在获取“原始”消息数据以用于 Google 的 API 时的样子:

using (var stream = new MemoryStream ()) {
    message.WriteTo (stream);

    var buffer = stream.ToArray ();
    var base64 = Convert.ToBase64String (buffer)
        .Replace( '+', '-' )
        .Replace( '/', '_' )
        .Replace( "=", "" );

    message.Raw = base64;
}

正如brandon927 指出的那样, text/html mime 部分的内容已被引用-可打印编码。这是一种用于传输的 MIME 编码,以确保它适合 7 位 ascii 范围。

您需要对此进行解码才能获得原始 HTML。

使用 MimeKit,如果您使用mimeMessage.HtmlBody或将MimeEntity表示 text/html 部分转换为 aTextPart并访问该Text属性,这将为您完成。

于 2017-05-06T00:51:12.797 回答
0

你的问题的答案是:没有问题。这就是 Raw 的简单呈现方式,带有quoted-printable编码。如果您发送电子邮件并查看其来源,Gmail 也会以这种方式呈现它。

于 2017-05-05T16:55:21.810 回答