c# - Why does the shortcut used by BitConverter when the start index is divisible by the size of the type being converted to work?

Question

I've recently been looking into how BitConverter works and from reading other SO questions I've read that it takes a 'shortcut' when the start index is divisible by the size of the type being converted to where it can just cast a pointer the byte at the index into a pointer to the type being converted to and de-reference it.

Source for ToInt16 as an example:

public static unsafe short ToInt16(byte[] value, int startIndex) {
     if( value == null)  {
          ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
     }

     if ((uint) startIndex >= value.Length) {
          ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index);
     }

     if (startIndex > value.Length -2) {
          ThrowHelper.ThrowArgumentException(ExceptionResource.Arg_ArrayPlusOffTooSmall);
     }
     Contract.EndContractBlock();

     fixed( byte * pbyte = &value[startIndex]) {
          if( startIndex % 2 == 0) { // data is aligned 
              return *((short *) pbyte);
          }
          else {
              if( IsLittleEndian) { 
                   return (short)((*pbyte) | (*(pbyte + 1) << 8)) ;
              }
              else {
                   return (short)((*pbyte << 8) | (*(pbyte + 1)));                        
              }
          }
     }
}

My question is why does this work regardless of the endianness of the machine, and why doesn't it use the same mechanism when the data is not aligned?

An example to clarify:

I have some bytes in buffer that I know are in Big endian format, and I want to read a short value from the array at say, index 5. I also assume that my machine, since it is Windows, uses little endian.

I would use BitConverter like so, by switching the order of my bytes to little endian:

BitConverter.ToInt16(new byte[] { buffer[6], buffer[5] })

assuming the code takes the shortcut it would do what I want: just cast the bytes as they are in the order provided and return the value. But if it didn't have that shortcut code, wouldn't it then reverse the byte order again and give me the wrong value? Or if I instead did:

BitConverter.ToInt16(new byte[] { 0, buffer[6], buffer[5] }, 1)

wouldn't it give me the wrong value since the index is not divisible by 2?

Another situation:

Say I had an array of bytes that contained an short somewhere I want to extract already in little endian format, but starting at an odd offset. Woulnd't the call to BitConverter reverse the order of the bytes since BitConverter.IsLittleEndian is true and the index is not aligned, thus giving me an incorrect value?

score 3 · Accepted Answer

该代码避免了不允许未对齐数据访问的处理器上的硬件异常，即总线错误。这非常昂贵，通常由内核代码解决，该代码将总线访问拆分并将字节粘合在一起。在编写这段代码的时候，这样的处理器仍然很普遍，这是 MIPS 等 RISC 设计流行的尾声。较旧的ARM 内核和 Itanium 是其他示例，所有它们的 .NET 版本都已发布。

对于没有问题的处理器（例如 Intel/AMD 内核）而言，它几乎没有什么区别。内存很慢。

该代码使用 IsLittleEndian 仅仅是因为它正在索引各个字节。这当然使字节顺序很重要。

score 1 · Accepted Answer

在大多数架构中，访问未在正确边界对齐的数据会导致性能下降。在 x86 上，CPU 将允许您从未对齐的地址读取，但会影响性能。在某些架构上，您会遇到操作系统将捕获的 CPU 故障。

我猜想让 CPU 修复读取未对齐数据的成本大于读取单个字节和执行移位/或操作的成本。此外，代码现在可移植到未对齐读取将导致错误的平台。

score 1 · Accepted Answer

为什么不管机器的字节顺序如何，这都能工作？

该方法对 s 进行了重新解释，byte假设它们是在具有相同字节序的环境中生成的。换句话说，字节序既影响输入字节在数组中的排列顺序，也影响字节需要以short相同方式排列在输出中的顺序。

为什么机器是 Big Endian 时不使用相同的机制？

这是一个很好的观察结果，作者不做演员表的原因还不是很明显。我认为其背后的原因是，如果您将 apbyte与奇数值转换为short*，则后续访问short将是未对齐的。这需要一个特殊的操作码来防止硬异常，某些平台在未对齐访问时会生成硬异常。

c# - Why does the shortcut used by BitConverter when the start index is divisible by the size of the type being converted to work?

3 回答 3

Related

Reference