c# - iCalendar RFC 2445 第 4.1 节内容折叠

Question

我正在使用 C# 创建一个简单的 iCalendar，发现 RFC 2445 第 4.1 节的内容折叠非常令人头疼（对我来说:-)。

http://www.apps.ietf.org/rfc/rfc2445.html#sec-4.1

对于长行，您要转义一些字符（我相信反斜杠、分号、逗号和换行符），然后将其折叠以使行不超过 75 个八位字节。我在网上找到了几种直接的方法。最简单的方法是用转义版本替换有问题的字符，然后在每 75 个字符处插入 CRLF。就像是：

// too simple, could break at an escape sequence boundary or multi-byte character may overflow 75 octets
txt = txt.Replace(@"\", "\\\\").Replace(";", "\\;").Replace(",", "\\,").Replace("\r\n", "\\n");
var regex = new System.Text.RegularExpressions.Regex( ".{75}");
var escape_and_folded = regex.Replace( txt, "$0\r\n ");

我看到两个问题。CRLF 可能被插入到转义序列中。例如，如果插入发生使得转义的新行序列“\n”变为“\CRLF”（那么“n”将在下一行）。第二个问题是当有多字节字符时。由于计算是按字符计算的，因此该行可能会超过 75 个八位字节。

一个简单的解决方案是逐个字符地遍历字符串并转义和折叠，但这似乎相当暴力。有人有更优雅的解决方案吗？

score 2 · Accepted Answer

首先，请确保您查看的是RFC5545。RFC2445 已过时。你可以在这里找到我的 PHP 实现：

https://github.com/fruux/sabre-vobject/blob/master/lib/Property.php#L252

在 php 中，我们有 mb_strcut 函数。我不确定是否有 .NET 等价物，但这至少会使事情变得简单得多。到目前为止，我对将转义序列 ( \) 折叠成两半没有任何问题。一个好的解析器会首先展开这些行，然后才处理转义。特别是因为必须转义哪些字符，取决于实际属性。（有时,或被;逃脱，有时他们不会）。

score 1 · Accepted Answer

我尝试了您的解决方案-它有效，除了它还折叠了一些长度小于 75 个八位字节的线。因此，我按照传统方式重写了代码（即不使用正则表达式 -我确实想念它们），如下所示。

    public static string FoldLines(this string value, int max, string newline = "\r\n")
    {
        var lines = value.Split(new string[]{newline}, System.StringSplitOptions.RemoveEmptyEntries);
        using (var ms = new System.IO.MemoryStream(value.Length))
        {
            var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
            var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
            foreach (var line in lines)
            {
                var bytes = Encoding.UTF8.GetBytes(line);
                var len = Encoding.UTF8.GetByteCount(line);
                if (len <= max)
                {
                    ms.Write(bytes, 0, len);
                    ms.Write(crlf, 0, crlf.Length); 
                }
                else
                {
                    var blen = len / max; //calculate block length
                    var rlen = len % max; //calculate remaining length
                    var b = 0;
                    while (b < blen)
                    {
                        ms.Write(bytes, (b++) * max, max);
                        ms.Write(crlfs, 0, crlfs.Length); 
                    }
                    if (rlen > 0)
                    {
                        ms.Write(bytes, blen * max, rlen);
                        ms.Write(crlf, 0, crlf.Length);
                    }
                }
            }

            return Encoding.UTF8.GetString(ms.ToArray());
        }
    }

备注：

我尽可能地优雅——即我没有按字符解析字符串，而是按八位字节块（由max确定）。
最好在生成的 VCALENDAR 对象上调用该函数，以便检查所有内容行是否折叠并在必要时进行包装。

特殊文字的转义仅在与 TEXT 相关的属性中执行，例如 DESCRIPTION、SUMMARY 等。这些在以下扩展方法中实现：

public static string Replace(this string value, IEnumerable<Tuple<string, string>> pairs)
{
    foreach (var pair in pairs) value = value.Replace(pair.Item1, pair.Item2);
    return value;
}

public static string EscapeStrings(this string value)
{
    return value.Replace(new List<Tuple<string, string>> 
    { 
        new Tuple<string, string>(@"\", "\\\\"),
        new Tuple<string, string>(";",  @"\;"),
        new Tuple<string, string>(",",  @"\,"),
        new Tuple<string, string>("\r\n",  @"\n"),
    });
}

score 0 · Accepted Answer

reexmonkey 的解决方案在中间折叠行上写了 76 个字符，因为它没有减去用 crlfs 添加的额外空格字符

我重写了折叠函数来纠正这个问题：

public static string FoldLines(string value, int max, string newline = "\r\n")
{
    var lines = value.Split(new string[] { newline }, System.StringSplitOptions.RemoveEmptyEntries);
    using (var ms = new System.IO.MemoryStream(value.Length))
    {
        var crlf = Encoding.UTF8.GetBytes(newline); //CRLF
        var crlfs = Encoding.UTF8.GetBytes(string.Format("{0} ", newline)); //CRLF and SPACE
        foreach (var line in lines)
        {
            var bytes = Encoding.UTF8.GetBytes(line);
            var len = Encoding.UTF8.GetByteCount(line);
            if (len <= max)
            {
                ms.Write(bytes, 0, len);
                ms.Write(crlf, 0, crlf.Length);
            }
            else
            {
                var offset = 0; //current offset position
                var count = max; //characters to take
                while (offset + count < len)
                {
                    ms.Write(bytes, offset, count);
                    ms.Write(crlfs, 0, crlfs.Length);
                    offset += count;
                    count = max - 1;
                }
                count = len - offset; //remaining characters
                if (count > 0)
                {
                    ms.Write(bytes, offset, count);
                    ms.Write(crlf, 0, crlf.Length);
                }
            }
        }

        return Encoding.UTF8.GetString(ms.ToArray());
    }
}

我还在 EscapeStrings 函数中添加了一个额外的元组：

public static string ReplaceText(string value, IEnumerable<Tuple<string, string>> pairs)
{
    foreach (var pair in pairs) value = value.Replace(pair.Item1, pair.Item2);
    return value;
}
public static string EscapeStrings(string value)
{
    return ReplaceText(value, new List <Tuple<string, string>>
    {
        new Tuple<string, string>(@"\", "\\\\"),
        new Tuple<string, string>(";",  @"\;"),
        new Tuple<string, string>(",",  @"\,"),
        new Tuple<string, string>("\r\n",  @"\n"),
        new Tuple<string, string>("\n",  @"\n"),
    });
}

c# - iCalendar RFC 2445 第 4.1 节内容折叠

3 回答 3

Related

Reference