c# - 字节数组末尾的大量 0 值

Question

我正在使用 BitMiracle 的 LibTiff.Net 读取位图图像并返回作为 Base64String 嵌入文件中的 TIFF 字节 []。我注意到 Base64 字符串最终比我预期的要长很多，它的尾部是大量的“A”字符。在调试时，我看到 LibTiff 返回给我的 byte[] 末尾有几千个 0 值，这似乎不是图像本身的必要部分（据我所知）。

我在这里使用 BitMiracle 的示例代码进行转换： https ://bitmiracle.github.io/libtiff.net/html/075f57db-d779-48f7-9fd7-4ca075a01599.htm

不过，我不太明白在字节 [] 末尾会导致“垃圾”的原因。有什么想法吗？

编辑以添加代码 - GetTiffImageBytes() 在上面的链接中：

public void GenImage()
      using (System.Drawing.Image frontImage = System.Drawing.Image.FromStream(file))//;
            {
                file.Close();

                //Draw something
                b = new Bitmap(frontImage);
                Graphics graphics = Graphics.FromImage(b);
                graphics.DrawString(data1, (Font)GlobalDict.FontDict["font1"], Brushes.Black, 200, 490);
                graphics.DrawString(data2, (Font)GlobalDict.FontDict["font2"], Brushes.Black, 680, 400);

            }
            //Convert to TIF - requires BitMiracle.LibTiff.Classic
            byte[] tiffBytes = GetTiffImageBytes(b, false);

            return tiffBytes;
            }

上面的调用是：

  byte[] aFrontImage = MiscTools.GenImage(somestuff);

  fileXML.WriteLine("    <FrontImage>" + System.Convert.ToBase64String(aFrontImage, 0, aFrontImage.Length) + "</FrontImage>");

所有的事情都说和做了，它运行良好，我们的应用程序可以读取生成的图像。我只是想缩小大小，因为其中一些文件可能有数万张图像。我有一些较旧的示例文件是通过另一种方法用一些 Base64 字符串手动创建的，这些字符串大小差不多，保存了我认为是垃圾的所有尾随字节。

正如有人评论的那样，一种选择可能是在转换之前读取 byte[] 并从末尾删除所有 0 值，但我试图弄清楚为什么会发生这种情况。

谢谢！

score 3 · Accepted Answer

问题很可能是这个，在链接源示例中找到：

return ms.GetBuffer();

对于 a MemoryStream，这将返回整个底层数组，即使您实际上还没有使用所有该数组。如果你写的足够多来填充它，这个缓冲区将被调整为更大的缓冲区，但它不会扩展到只覆盖所需的大小，它每次都会增长到之前大小的两倍。此外，您还有一个 Length 属性，该属性将指示实际使用了该数组的多少。

这类似于 a 的容量，List<T>每次填充当前容量时，它的大小也会翻倍。该Count属性将指示您在列表中实际拥有的项目数。

修复很简单，将上面的代码行替换为：

return ms.ToArray();

这将创建一个新数组，其大小刚好足以包含实际写入内存流的字节，并将缓冲区的内容（适合和计数的部分）复制到其中。

要验证缓冲区是否大于所需，您可以运行以下简单代码：

var ms = new MemoryStream();
Console.WriteLine("GetBuffer: " + ms.GetBuffer().Length);
Console.WriteLine("ToArray: " + ms.ToArray().Length);
ms.WriteByte(0);
Console.WriteLine("GetBuffer: " + ms.GetBuffer().Length);
Console.WriteLine("ToArray: " + ms.ToArray().Length);

这将输出：

GetBuffer: 0
ToArray: 0
GetBuffer: 256
ToArray: 1

如您所见，仅写入 1 个字节时的初始缓冲区大小增加到 256 个字节。在此之后，每次达到当前大小时，它都会翻倍。

.NET 小提琴在这里。

score 0 · Accepted Answer

现在，我只是在事后“修复”问题并创建了一个我在每个图像上调用的方法：

        private static byte[] fixImageByteArray(byte[] inByte)  // Fix issue with garbage suffix data - reduces image byte[] size by roughly half.
    {
        int newByteBaseLength = inByte.Length - 1;  
        while (inByte[newByteBaseLength] == 0)
        {
            --newByteBaseLength;
        }

        float newByteModifiedLength = ((inByte.Length - newByteBaseLength) * 0.1f) + 0.5f;  // When using newByteBaseLength + 1, some TIFF Tag data was getting lost.  This seems to resolve the issue.

        int newByteModifiedLengthAsInt = (int)newByteModifiedLength;

        byte[] outByte = new byte[newByteBaseLength + newByteModifiedLengthAsInt];
        Array.Copy(inByte, outByte, newByteBaseLength + newByteModifiedLengthAsInt);

        return outByte;
    }

编辑：我修改了变量名称以使其更有意义。我发现（使用newByteBaseLength + 1）调整数组大小的旧方法会对 TIFF 标签造成一些损坏。通过使用效率稍低的方法，图像大小仍然显着减小，但标签保持不变。

c# - 字节数组末尾的大量 0 值

2 回答 2

Related

Reference