c# - 限制字符串中的 UTF-8 编码字节长度

Question

我需要限制byte[]使用 UTF-8 编码的输出长度。例如。byte[]长度必须小于或等于1000首先我写了以下代码

            int maxValue = 1000;

            if (text.Length > maxValue)
                text = text.Substring(0, maxValue);
            var textInBytes = Encoding.UTF8.GetBytes(text);

如果字符串仅使用 ASCII 字符，则效果很好，因为每个字符 1 个字节。但如果字符超出此范围，则每个字符可能是 2 或 3 甚至 6 个字节。这将是上面代码的问题。所以为了解决这个问题，我写了这个。

            List<byte> textInBytesList = new List<byte>();
            char[] textInChars = text.ToCharArray();
            for (int a = 0; a < textInChars.Length; a++)
            {
                byte[] valueInBytes = Encoding.UTF8.GetBytes(textInChars, a, 1);
                if ((textInBytesList.Count + valueInBytes.Length) > maxValue)
                    break;

                textInBytesList.AddRange(valueInBytes);
            }

我没有测试过代码，但我确信它会按我的意愿工作。但是，我不喜欢它的完成方式，有没有更好的方法来做到这一点？我错过了什么？还是不知道？

谢谢你。

score 1 · Accepted Answer

我第一次在 Stack Overflow 上发帖，所以要温柔！这种方法应该会很快为您处理好事情。

    public static byte[] GetBytes(string text, int maxArraySize, Encoding encoding) {
        if (string.IsNullOrEmpty(text)) return null;            

        int tail = Math.Min(text.Length, maxArraySize);
        int size = encoding.GetByteCount(text.Substring(0, tail));
        while (tail >= 0 && size > maxArraySize) {
            size -= encoding.GetByteCount(text.Substring(tail - 1, 1));
            --tail;
        }

        return encoding.GetBytes(text.Substring(0, tail));
    }

它类似于您正在做的事情，但没有增加 List 的开销，也不需要每次都从字符串的开头开始计数。我从字符串的另一端开始，当然，假设是所有字符都必须至少是一个字节。因此，开始向下遍历字符串比 maxArraySize （或字符串的总长度）更远是没有意义的。

然后你可以像这样调用方法..

        byte[] bytes = GetBytes(text, 1000, Encoding.UTF8);

c# - 限制字符串中的 UTF-8 编码字节长度

1 回答 1

Related

Reference