c# - 整数的可变长度编码

Question

在 C# 中对无符号整数值进行可变长度编码的最佳方法是什么？

“实际意图是将可变长度编码的整数（字节）附加到文件头。”

例如：“Content-Length” - Http Header

这可以通过对下面的逻辑进行一些更改来实现。

我写了一些代码来做到这一点....

score 17 · Accepted Answer

我使用的一种使较小的值使用较少字节的方法是对 7 位数据 + 1 位开销 pr 进行编码。字节。

编码仅适用于从零开始的正值，但如果需要也可以修改以处理负值。

编码的工作方式是这样的：

获取值的最低 7 位并将它们存储在一个字节中，这就是您要输出的内容
将值向右移动 7 位，去掉刚刚抓取的那 7 位
如果该值非零（即，从它移开 7 位之后），请在输出之前设置要输出的字节的高位
输出字节
如果该值非零（即导致设置高位的相同检查），请返回并从头开始重复这些步骤

解码：

从位位置 0 开始
从文件中读取一个字节
存储高位是否设置，屏蔽掉
或将字节的其余部分转换为最终值，在您所在的位位置
如果设置了高位，则将位位置增加 7，然后重复这些步骤，跳过第一个（不要重置位位置）

          39 32 31 24 23 16 15 8 7 0
值：|DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA|
编码：|0000DDDD|xDDDDCCC|xCCCCCBB|xBBBBBBA|xAAAAAAA| （注意，以相反的顺序存储）

如您所见，由于控制位的开销，编码值可能会占用一个刚刚使用一半的额外字节。如果将此扩展为 64 位值，则额外的字节将被完全使用，因此仍然只有一个字节的额外开销。

注意：由于编码一次存储一个字节的值，总是以相同的顺序，大端或小端系统不会改变它的布局。最低有效字节总是首先存储，等等。

范围及其编码大小：

          0 - 127 : 1 个字节
        128 - 16.383 : 2 个字节
     16.384 - 2.097.151：3 个字节
  2.097.152 - 268.435.455：4 个字节
268.435.456 - max-int32：5 个字节

这是两者的 C# 实现：

void Main()
{
    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Create))
    using (BinaryWriter writer = new BinaryWriter(stream))
        writer.EncodeInt32(123456789);

    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Open))
    using (BinaryReader reader = new BinaryReader(stream))
        reader.DecodeInt32().Dump();
}

// Define other methods and classes here

public static class Extensions
{
    /// <summary>
    /// Encodes the specified <see cref="Int32"/> value with a variable number of
    /// bytes, and writes the encoded bytes to the specified writer.
    /// </summary>
    /// <param name="writer">
    /// The <see cref="BinaryWriter"/> to write the encoded value to.
    /// </param>
    /// <param name="value">
    /// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>.
    /// </param>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="writer"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="value"/> is less than 0.</para>
    /// </exception>
    /// <remarks>
    /// See <see cref="DecodeInt32"/> for how to decode the value back from
    /// a <see cref="BinaryReader"/>.
    /// </remarks>
    public static void EncodeInt32(this BinaryWriter writer, int value)
    {
        if (writer == null)
            throw new ArgumentNullException("writer");
        if (value < 0)
            throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater");

        do
        {
            byte lower7bits = (byte)(value & 0x7f);
            value >>= 7;
            if (value > 0)
                lower7bits |= 128;
            writer.Write(lower7bits);
        } while (value > 0);
    }

    /// <summary>
    /// Decodes a <see cref="Int32"/> value from a variable number of
    /// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader.
    /// </summary>
    /// <param name="reader">
    /// The <see cref="BinaryReader"/> to read the encoded value from.
    /// </param>
    /// <returns>
    /// The decoded <see cref="Int32"/> value.
    /// </returns>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="reader"/> is <c>null</c>.</para>
    /// </exception>
    public static int DecodeInt32(this BinaryReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException("reader");

        bool more = true;
        int value = 0;
        int shift = 0;
        while (more)
        {
            byte lower7bits = reader.ReadByte();
            more = (lower7bits & 128) != 0;
            value |= (lower7bits & 0x7f) << shift;
            shift += 7;
        }
        return value;
    }
}

score 1 · Accepted Answer

1

如果小值比大值更常见，您可以使用Golomb 编码。

于 2010-08-25T06:16:19.660 回答

score 1 · Accepted Answer

我知道这个问题是几年前提出的，但是对于 MIDI 开发人员，我想分享一些来自我正在从事的个人 MIDI 项目的代码。该代码块基于 Paul Messick 的《Maximum MIDI》一书中的一段（此示例是根据我自己的需要调整的版本，但概念就在那里……）。

    public struct VariableLength
    {
        // Variable Length byte array to int
        public VariableLength(byte[] bytes)
        {
            int index = 0;
            int value = 0;
            byte b;
            do
            {
                value = (value << 7) | ((b = bytes[index]) & 0x7F);
                index++;
            } while ((b & 0x80) != 0);

            Length = index;
            Value = value;
            Bytes = new byte[Length];
            Array.Copy(bytes, 0, Bytes, 0, Length);
        }

        // Variable Length int to byte array
        public VariableLength(int value)
        {
            Value = value;
            byte[] bytes = new byte[4];
            int index = 0;
            int buffer = value & 0x7F;

            while ((value >>= 7) > 0)
            {
                buffer <<= 8;
                buffer |= 0x80;
                buffer += (value & 0x7F);
            }
            while (true)
            {
                bytes[index] = (byte)buffer;
                index++;
                if ((buffer & 0x80) > 0)
                    buffer >>= 8;
                else
                    break;
            }

            Length = index;
            Bytes = new byte[index];
            Array.Copy(bytes, 0, Bytes, 0, Length);
        }

        // Number of bytes used to store the variable length value
        public int Length { get; private set; }
        // Variable Length Value
        public int Value { get; private set; }
        // Bytes representing the integer value
        public byte[] Bytes { get; private set; }
    }

如何使用：

public void Example()
{   
//Convert an integer into a variable length byte
int varLenVal = 480;     
VariableLength v = new VariableLength(varLenVal);
byte[] bytes = v.Bytes;

//Convert a variable length byte array into an integer
byte[] varLenByte = new byte[2]{131, 96};     
VariableLength v = new VariableLength(varLenByte);
int result = v.Length;
}

score 1 · Accepted Answer

您应该首先制作您的价值的直方图。如果分布是随机的（即，直方图计数的每个 bin 都接近另一个），那么您将无法比该数字的二进制表示更有效地进行编码。

如果您的直方图不平衡（即，如果某些值比其他值更多），那么选择对这些值使用较少位，而对其他不太可能的值使用更多位的编码可能是有意义的。

例如，如果您需要编码的数字小于 15 位的可能性比大于更大的可能性高 2 倍，您可以使用第 16 位来判断，并且只存储/发送 16 位（如果为零，那么接下来的字节将形成一个适合 32 位数字的 16 位数字）。如果它是 1，那么接下来的 25 位将形成一个 32 位的数字。你在这里失去了一点，但因为它不太可能，最终，对于很多数字，你赢得更多的位。

显然，这是一个微不足道的情况，将其扩展到超过 2 种情况是霍夫曼算法，该算法根据数字出现的概率影响接近最优的“码字”。

还有算术编码算法也可以做到这一点（可能还有其他）。

在所有情况下，没有比目前在计算机内存中所做的更有效地存储随机值的解决方案。

您必须考虑与您最终将获得的节省相比，实施这种解决方案需要多长时间和多难才能知道它是否值得。语言本身与这里无关。

score 1 · Accepted Answer

正如Grimbly 指出的那样，存在BinaryReader.Read7BitEncodedInt和BinaryWriter.Write7BitEncodedInt。但是，这些是不能从 BinaryReader 或 -Writer 对象调用的内部方法。

但是，您可以做的是获取内部实现并从阅读器和作者那里复制它：

public static int Read7BitEncodedInt(this BinaryReader br) {
    // Read out an Int32 7 bits at a time.  The high bit 
    // of the byte when on means to continue reading more bytes.
    int count = 0;
    int shift = 0;
    byte b;
    do {
        // Check for a corrupted stream.  Read a max of 5 bytes.
        // In a future version, add a DataFormatException.
        if (shift == 5 * 7)  // 5 bytes max per Int32, shift += 7
            throw new FormatException("Format_Bad7BitInt32");

        // ReadByte handles end of stream cases for us. 
        b = br.ReadByte();
        count |= (b & 0x7F) << shift;
        shift += 7;
    } while ((b & 0x80) != 0); 
    return count;
}   

public static void Write7BitEncodedInt(this BinaryWriter br, int value) {
    // Write out an int 7 bits at a time.  The high bit of the byte,
    // when on, tells reader to continue reading more bytes.
    uint v = (uint)value;   // support negative numbers
    while (v >= 0x80) {
        br.Write((byte)(v | 0x80));
        v >>= 7;
    }
    br.Write((byte)v);
}

当您在项目的任何类中包含此代码时，您将能够在任何BinaryReader/BinaryWriter对象上使用这些方法。它们只是稍作修改以使它们在原始类之外工作（例如通过更改ReadByte()为br.ReadByte()）。评论来自原始来源。

score -1 · Accepted Answer

-1

BinaryReader.Read7BitEncodedInt 方法？

BinaryWriter.Write7BitEncodedInt 方法？

于 2015-04-30T01:08:02.753 回答

c# - 整数的可变长度编码

6 回答 6

Related

Reference