9

我正在尝试编写两个函数escape(text, delimiter)unescape(text, delimiter)具有以下属性:

  1. 结果escape不包含delimiter

  2. unescape是 的倒数escape,即

    unescape(escape(text, delimiter), delimiter) == text
    

    text对于和的所有值delimiter

可以限制 的允许值delimiter


背景:我想创建一个以分隔符分隔的值字符串。为了能够再次从字符串中提取相同的列表,我必须确保单独的分隔字符串不包含分隔符。


我试过的:我想出了一个简单的解决方案(伪代码):

escape(text, delimiter):   return text.Replace("\", "\\").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\")

但发现属性 2 在测试字符串上失败"\d<delimiter>"。目前,我有以下工作解决方案

escape(text, delimiter):   return text.Replace("\", "\b").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\")

这似乎有效,只要delimiteris not \, bor d(这很好,我不想将它们用作分隔符)。但是,由于我还没有正式证明它的正确性,恐怕我错过了一些违反其中一个属性的情况。由于这是一个常见的问题,我假设已经有一个“众所周知的证明正确”的算法,因此我的问题(见标题)。

4

2 回答 2

4

你的第一个算法是正确的。

错误在于 unescape() 的实现:您需要在同一个 pass 中同时替换\dbydelimiter\\by 。您不能像这样多次调用 Replace()。\

下面是一些用于安全引用分隔符分隔字符串的示例 C# 代码:

    static string QuoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~" -> "~~"     ";" -> "~s"
    {
        var sb = new StringBuilder(str.Length);
        foreach (char c in str)
        {
            if (c == quoteChar)
            {
                sb.Append(quoteChar);
                sb.Append(quoteChar);
            }
            else if (c == separator)
            {
                sb.Append(quoteChar);
                sb.Append(otherChar);
            }
            else
            {
                sb.Append(c);
            }
        }
        return sb.ToString(); // no separator in the result -> Join/Split is safe
    }
    static string UnquoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~~" -> "~"     "~s" -> ";"
    {
        var sb = new StringBuilder(str.Length);
        bool isQuoted = false;
        foreach (char c in str)
        {
            if (isQuoted)
            {
                if (c == otherChar)
                    sb.Append(separator);
                else
                    sb.Append(c);
                isQuoted = false;
            }
            else
            {
                if (c == quoteChar)
                    isQuoted = true;
                else
                    sb.Append(c);
            }
        }
        if (isQuoted)
            throw new ArgumentException("input string is not correctly quoted");
        return sb.ToString(); // ";" are restored
    }

    /// <summary>
    /// Encodes the given strings as a single string.
    /// </summary>
    /// <param name="input">The strings.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static string QuoteAndJoin(this IEnumerable<string> input,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(input, "input");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot quote: ambiguous format");
        return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray());
    }

    /// <summary>
    /// Decodes the strings encoded in a single string.
    /// </summary>
    /// <param name="encoded">The encoded.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static IEnumerable<string> SplitAndUnquote(this string encoded,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(encoded, "encoded");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot unquote: ambiguous format");
        return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar);
    }
于 2012-06-14T13:08:12.650 回答
0

当分隔符确实\,bd. unescape在算法中也使用相同的替代替换

于 2012-06-14T13:12:25.140 回答