c# - 从 SQL Server 检索 blob 数据的最节省内存的方法

Question

从 SQL Server 检索大型 blob 数据时出现内存不足异常。我正在调用一个存储过程，它返回 6 列简单数据和 1 个varbinary(max)数据列。

我正在使用此代码执行存储过程：

m_DataReader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);

并确保我按列顺序从数据读取器中读取列。

请参阅有关检索大数据的 MSDN 文章

对于varbinary(max)专栏，我正在像这样读取数据：

DocBytes = m_DataReader.GetValue(i) as byte[];

我注意到的是，在内存不足时，我似乎在内存中有 2 个字节数组副本。一个在DocBytes数组中，另一个在SqlDataReader.

为什么会有这个副本？我假设我会传递一个引用，或者这是由于SqlDataReader提供数据的内部方式 - 即它总是提供一个副本？

有没有更高效的方式从数据库中读取数据？

我已经查看了新的 .NET 4.5GetStream方法，但不幸的是，我没有能力传递流——我需要内存中的字节——所以我无法遵循其他流到文件或 Web 响应的示例。但我想尝试确保内存中一次只存在一个副本！

我得出的结论是，这可能就是它必须的方式，并且重复副本只是一个尚未被垃圾收集的缓冲区。我真的不想为强制垃圾收集而烦恼，我希望有人对替代方法有一些想法。

score 1 · Accepted Answer

I have looked at the new .NET 4.5 GetStream method, but unfortunately, I do not have the ability to pass the stream on - I need the bytes in memory

So all you have to do is read from this stream into a byte array.

Alternatively you could try reading it in small chunks from the reader using the GetBytes method as shown here: https://stackoverflow.com/a/625485/29407

score 1 · Accepted Answer

从 SQL 检索二进制数据时，您可以选择。假设您使用 varbinary（图像已废弃）作为数据类型，您可以返回所有数据，也可以使用简单的子字符串函数仅返回部分数据。如果二进制文件很大（如 1 gb），返回所有数据将占用大量内存。

如果是这种情况，您可以选择采用更迭代的方法来返回数据。假设它是一个 1 gb 的二进制文件，您可以让程序循环遍历 100mb 块中的数据，将每个块写入磁盘，然后丢弃缓冲区，然后返回下一个 100mb 块。

要获得您将使用的第一个块：

Declare @ChunkCounter as integer
Declare @Data as varbinary(max)
Declare @ChunkSize as integer = 10000000
Declare @bytes as integer
Select @bytes = datalength(YourField) from YourTable where ID = YourID
If @bytes> @ChunkSize 
      Begin 
           /* use substring to get the first chunksize   */ 
           Select @data= substring(YourField,0,@ChunkSize), @Chunkcounter +1 as 'ChunkCounter'
           FROM YourTable   
           where ID = YourID
      End 
Else
      Begin ....

score 1 · Accepted Answer

问题是DbDataReader.GetStream()创建一个MemoryStream并用该字段的数据填充此流。为了避免这种情况，我创建了一个扩展方法：

public static class DataReaderExtensions
{
    /// <summary>
    /// writes the content of the field into a stream
    /// </summary>
    /// <param name="reader"></param>
    /// <param name="ordinal"></param>
    /// <param name="stream"></param>
    /// <returns>number of written bytes</returns>
    public static long WriteToStream(this IDataReader reader, int ordinal, Stream stream)
    {
        if (stream == null)
            throw new ArgumentNullException("stream");

        if (reader.IsDBNull(ordinal))
            return 0;

        long num = 0L;
        byte[] array = new byte[8192];
        long bytes;
        do
        {
            bytes = reader.GetBytes(ordinal, num, array, 0, array.Length);
            stream.Write(array, 0, (int)bytes);
            num += bytes;
        }
        while (bytes > 0L);
        return num;
    }

    /// <summary>
    /// writes the content of the field into a stream
    /// </summary>
    /// <param name="reader"></param>
    /// <param name="field"></param>
    /// <param name="stream"></param>
    /// <returns>number of written bytes</returns>
    public static long WriteToStream(this IDataReader reader, string field, Stream stream)
    {
        int ordinal = reader.GetOrdinal(field);
        return WriteToStream(reader, ordinal, stream);
    }
}

score 0 · Accepted Answer

你知道数据的长度吗？在这种情况下，您可以使用流式传输方法将数据复制到完美大小的byte[]. 这将消除在非流式传输情况下似乎出现在 ADO.NET 内部的双缓冲。

score 0 · Accepted Answer

DocBytes = m_DataReader.GetValue(i) as byte[];

这将创建一个大小为 DATA_LENGTH(column_name) 的缓冲区，
然后将其完整复制到您的 MemoryStream。
当 DATA_LENGTH(column_name) 值很大时，这很糟糕。
您需要通过缓冲区将其复制到内存流。

此外，如果您的文件太大，请将其写入临时文件，而不是将其完整存储在 MemoryStream 中。

我就是这样做的

    // http://stackoverflow.com/questions/2885335/clr-sql-assembly-get-the-bytestream
    // http://stackoverflow.com/questions/891617/how-to-read-a-image-by-idatareader
    // http://stackoverflow.com/questions/4103406/extracting-a-net-assembly-from-sql-server-2005
    public static void RetrieveFileStream(System.Data.IDbCommand cmd, string columnName, string path)
    {
        using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
        {
            bool hasRows = reader.Read();
            if (hasRows)
            {
                const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
                byte[] buffer = new byte[BUFFER_SIZE];

                int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
                int bytesRead = 0;
                int offset = 0;

                // Write the byte stream out to disk
                using (System.IO.FileStream bytestream = new System.IO.FileStream(path, System.IO.FileMode.Create, System.IO.FileAccess.Write, System.IO.FileShare.None))
                {
                    while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
                    {
                        bytestream.Write(buffer, 0, bytesRead);
                        offset += bytesRead;
                    } // Whend

                    bytestream.Close();
                } // End Using bytestream 

            } // End if (!hasRows)

            reader.Close();
        } // End Using reader

    } // End Function RetrieveFile

采用此代码写入 memoryStream 很简单。
也许您需要使缓冲区大小更小或更大。

    public static System.IO.MemoryStream RetrieveMemoryStream(System.Data.IDbCommand cmd, string columnName, string path)
    {
        System.IO.MemoryStream ms = new System.IO.MemoryStream();

        using (System.Data.IDataReader reader = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess | System.Data.CommandBehavior.CloseConnection))
        {
            bool hasRows = reader.Read();
            if (hasRows)
            {
                const int BUFFER_SIZE = 1024 * 1024 * 10; // 10 MB
                byte[] buffer = new byte[BUFFER_SIZE];

                int col = string.IsNullOrEmpty(columnName) ? 0 : reader.GetOrdinal(columnName);
                int bytesRead = 0;
                int offset = 0;

                // Write the byte stream out to disk
                while ((bytesRead = (int)reader.GetBytes(col, offset, buffer, 0, BUFFER_SIZE)) > 0)
                {
                    ms.Write(buffer, 0, bytesRead);
                    offset += bytesRead;
                } // Whend

            } // End if (!hasRows)

            reader.Close();
        } // End Using reader

        return ms;
    } // End Function RetrieveFile

如果您需要将其放入 Response.OutputStream，请考虑直接将其写入其中，而不是通过 MemoryStream.ToArray() + WriteBytes。

c# - 从 SQL Server 检索 blob 数据的最节省内存的方法

5 回答 5

Related

Reference