0

我正在对几种在 C# .Net 中执行文件 IO 的方法进行性能测试。我执行的两个任务是将一个文本文件完整地读入缓冲区,然后解析该文件。数据是带有标题记录的制表符分隔记录,但这并不重要。

测试中唯一不同的是用于从文件中读取数据的方法。解析代码是相同的。但是,一种文件读取方法(直接调用 Windows API ReadFile)比另一种(File.ReadAllBytes)快 1.4 倍,但之后使用 ReadFile 读取的文件的解析时间比应用于数据时长 2.5 倍使用 File.ReadAllBytes 读取!

调用 ReadFile 涉及固定和取消固定内存和不安全代码。这是否解释了性能下降?有办法解决吗?调用 ReadFile 的更好方法?这是我的测试代码循环。我没有展示解析代码,但它的所有工作都仅限于扫描 IO 缓冲区、查找制表符和 CRLF 分隔符以及创建字符串。它本身没有不安全的代码,也没有对 String 类和 Encoding 类以及泛型 List 之外的外部方法的任何调用。

private void mctlReadFileButton_Click(object sender, EventArgs e)
{
    //MessageBox.Show(Separator, "Record Delimiter");
    var timer = new Stopwatch();
    var timer2 = new Stopwatch();
    int bufferSize = 128 * 1024 * 1024;
    byte[] ByteBuf = new byte[bufferSize];
    WinFileIO WFIO = null;

    double readTimeTotal = 0;
    double parseTimeTotal = 0;

    int loopCount = RepeatCount;

    mctlTimeTextBox.Text = "Running...";
    mctlTimeToParseTextBox.Text = "Running...";
    mctlTimeTextBox.Refresh();
    mctlTimeToParseTextBox.Refresh();

    for (var i = 0; i < loopCount; i++)
    {
        timer.Start();
        switch (IOMethod)
        {
            case "Windows API":
                WFIO = new WinFileIO(ByteBuf);
                WFIO.OpenForReading(FileName);
                WFIO.ReadBlocks(bufferSize);
                WFIO.Close();

                break;
            case "File.ReadAllBytes":
                ByteBuf = File.ReadAllBytes(FileName);
                break;
            default:
                break;
        }
        timer.Stop();
        readTimeTotal += (double)timer.ElapsedMilliseconds / 1000.0;
        timer.Reset();

        timer2.Start();
        if (ParseFile)
        {
            var parser = new ColumnSelector(new List<String>() 
            {
            "AcctID", "Policy#", "Location ID", "TIV", "LOSS Gross", "State"
            },
                ByteBuf
            );
            var columns = parser.Select();
        }
        if (WFIO != null)
            WFIO.Dispose();
        timer2.Stop();
        parseTimeTotal += (double)timer2.ElapsedMilliseconds / 1000.0;
        timer2.Reset();
    }

    var timeMsg = "" + loopCount + " times. " + readTimeTotal + " sec";
    mctlTimeTextBox.Text = timeMsg;
    mctlTimeTextBox.Refresh();

    timeMsg = "" + loopCount + " times. " + parseTimeTotal + " sec";
    mctlTimeToParseTextBox.Text = timeMsg;
    mctlTimeToParseTextBox.Refresh();


}

对 Windows ReadFile 函数的调用依赖于 Robert G. Bryan 编写的一些示例代码。这是他的 WinFileIO 类。我不相信我以任何方式对其进行了修改。

using System;
using System.ComponentModel;
using System.Runtime.InteropServices;

namespace Win32FileIO
{
    unsafe public class WinFileIO : IDisposable
    {
        // This class provides the capability to utilize the ReadFile and Writefile windows IO functions.  These functions
        // are the most efficient way to perform file I/O from C# or even C++.  The constructor with the buffer and buffer
        // size should usually be called to init this class.  PinBuffer is provided as an alternative.  The reason for this
        // is because a pointer needs to be obtained before the ReadFile or WriteFile functions are called.
        //
        // Error handling - In each public function of this class where an error can occur, an ApplicationException is
        // thrown with the Win32Exception message info if an error is detected.  If no exception is thrown, then a normal
        // return is considered success.
        // 
        // This code is not thread safe.  Thread control primitives need to be added if running this in a multi-threaded
        // environment.
        //
        // The recommended and fastest function for reading from a file is to call the ReadBlocks method.
        // The recommended and fastest function for writing to a file is to call the WriteBlocks method.
        //
        // License and disclaimer:
        // This software is free to use by any individual or entity for any endeavor for profit or not.
        // Even though this code has been tested and automated unit tests are provided, there is no gaurantee that
        // it will run correctly with your system or environment.  I am not responsible for any failure and you agree
        // that you accept any and all risk for using this software.
        //
        //
        // Written by Robert G. Bryan in Feb, 2011.
        //
        // Constants required to handle file I/O:
        private const uint GENERIC_READ = 0x80000000;
        private const uint GENERIC_WRITE = 0x40000000;
        private const uint OPEN_EXISTING = 3;
        private const uint CREATE_ALWAYS = 2;
        private const int BlockSize = 65536;
        //
        private GCHandle gchBuf;            // Handle to GCHandle object used to pin the I/O buffer in memory.
        private System.IntPtr pHandle;      // Handle to the file to be read from or written to.
        private void* pBuffer;              // Pointer to the buffer used to perform I/O.

        // Define the Windows system functions that are called by this class via COM Interop:
        [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)]
        static extern unsafe System.IntPtr CreateFile
        (
             string FileName,          // file name
             uint DesiredAccess,       // access mode
             uint ShareMode,           // share mode
             uint SecurityAttributes,  // Security Attributes
             uint CreationDisposition, // how to create
             uint FlagsAndAttributes,  // file attributes
             int hTemplateFile         // handle to template file
        );

        [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)]
        static extern unsafe bool ReadFile
        (
             System.IntPtr hFile,      // handle to file
             void* pBuffer,            // data buffer
             int NumberOfBytesToRead,  // number of bytes to read
             int* pNumberOfBytesRead,  // number of bytes read
             int Overlapped            // overlapped buffer which is used for async I/O.  Not used here.
        );

        [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)]
        static extern unsafe bool WriteFile
        (
            IntPtr handle,                     // handle to file
            void* pBuffer,             // data buffer
            int NumberOfBytesToWrite,    // Number of bytes to write.
            int* pNumberOfBytesWritten,// Number of bytes that were written..
            int Overlapped                     // Overlapped buffer which is used for async I/O.  Not used here.
        );

        [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)]
        static extern unsafe bool CloseHandle
        (
             System.IntPtr hObject     // handle to object
        );

        public WinFileIO()
        {
            pHandle = IntPtr.Zero;
        }

        public WinFileIO(Array Buffer)
        {
            // This constructor is provided so that the buffer can be pinned in memory.
            // Cleanup must be called in order to unpin the buffer.
            PinBuffer(Buffer);
            pHandle = IntPtr.Zero;
        }

        protected void Dispose(bool disposing)
        {
            // This function frees up the unmanaged resources of this class.
            Close();
            UnpinBuffer();
        }

        public void Dispose()
        {
            // This method should be called to clean everything up.
            Dispose(true);
            // Tell the GC not to finalize since clean up has already been done.
            GC.SuppressFinalize(this);
        }

        ~WinFileIO()
        {
            // Finalizer gets called by the garbage collector if the user did not call Dispose.
            Dispose(false);
        }

        public void PinBuffer(Array Buffer)
        {
            // This function must be called to pin the buffer in memory before any file I/O is done.
            // This shows how to pin a buffer in memory for an extended period of time without using
            // the "Fixed" statement.  Pinning a buffer in memory can take some cycles, so this technique
            // is helpful when doing quite a bit of file I/O.
            //
            // Make sure we don't leak memory if this function was called before and the UnPinBuffer was not called.
            UnpinBuffer();
            gchBuf = GCHandle.Alloc(Buffer, GCHandleType.Pinned);
            IntPtr pAddr = Marshal.UnsafeAddrOfPinnedArrayElement(Buffer, 0);
            // pBuffer is the pointer used for all of the I/O functions in this class.
            pBuffer = (void*)pAddr.ToPointer();
        }

        public void UnpinBuffer()
        {
            // This function unpins the buffer and needs to be called before a new buffer is pinned or
            // when disposing of this object.  It does not need to be called directly since the code in Dispose
            // or PinBuffer will automatically call this function.
            if (gchBuf.IsAllocated)
                gchBuf.Free();
        }

        public void OpenForReading(string FileName)
        {
            // This function uses the Windows API CreateFile function to open an existing file.
            // A return value of true indicates success.
            Close();
            pHandle = CreateFile(FileName, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0);
            if (pHandle == System.IntPtr.Zero)
            {
                Win32Exception WE = new Win32Exception();
                ApplicationException AE = new ApplicationException("WinFileIO:OpenForReading - Could not open file " +
                  FileName + " - " + WE.Message);
                throw AE;
            }
        }

        public void OpenForWriting(string FileName)
        {
            // This function uses the Windows API CreateFile function to open an existing file.
            // If the file exists, it will be overwritten.
            Close();
            pHandle = CreateFile(FileName, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0);
            if (pHandle == System.IntPtr.Zero)
            {
                Win32Exception WE = new Win32Exception();
                ApplicationException AE = new ApplicationException("WinFileIO:OpenForWriting - Could not open file " +
                    FileName + " - " + WE.Message);
                throw AE;
            }
        }

        public int Read(int BytesToRead)
        {
            // This function reads in a file up to BytesToRead using the Windows API function ReadFile.  The return value
            // is the number of bytes read.
            int BytesRead = 0;
            if (!ReadFile(pHandle, pBuffer, BytesToRead, &BytesRead, 0))
            {
                Win32Exception WE = new Win32Exception();
                ApplicationException AE = new ApplicationException("WinFileIO:Read - Error occurred reading a file. - " +
                    WE.Message);
                throw AE;
            }
            return BytesRead;
        }

        public int ReadUntilEOF()
        {
            // This function reads in chunks at a time instead of the entire file.  Make sure the file is <= 2GB.
            // Also, if the buffer is not large enough to read the file, then an ApplicationException will be thrown.
            // No check is made to see if the buffer is large enough to hold the file.  If this is needed, then
            // use the ReadBlocks function below.
            int BytesReadInBlock = 0, BytesRead = 0;
            byte* pBuf = (byte*)pBuffer;
            // Do until there are no more bytes to read or the buffer is full.
            for (; ; )
            {
                if (!ReadFile(pHandle, pBuf, BlockSize, &BytesReadInBlock, 0))
                {
                    // This is an error condition.  The error msg can be obtained by creating a Win32Exception and
                    // using the Message property to obtain a description of the error that was encountered.
                    Win32Exception WE = new Win32Exception();
                    ApplicationException AE = new ApplicationException("WinFileIO:ReadUntilEOF - Error occurred reading a file. - "
                        + WE.Message);
                    throw AE;
                }
                if (BytesReadInBlock == 0)
                    break;
                BytesRead += BytesReadInBlock;
                pBuf += BytesReadInBlock;
            }
            return BytesRead;
        }

        public int ReadBlocks(int BytesToRead)
        {
            // This function reads a total of BytesToRead at a time.  There is a limit of 2gb per call.
            int BytesReadInBlock = 0, BytesRead = 0, BlockByteSize;
            byte* pBuf = (byte*)pBuffer;
            // Do until there are no more bytes to read or the buffer is full.
            do
            {
                BlockByteSize = Math.Min(BlockSize, BytesToRead - BytesRead);
                if (!ReadFile(pHandle, pBuf, BlockByteSize, &BytesReadInBlock, 0))
                {
                    Win32Exception WE = new Win32Exception();
                    ApplicationException AE = new ApplicationException("WinFileIO:ReadBytes - Error occurred reading a file. - "
                        + WE.Message);
                    throw AE;
                }
                if (BytesReadInBlock == 0)
                    break;
                BytesRead += BytesReadInBlock;
                pBuf += BytesReadInBlock;
            } while (BytesRead < BytesToRead);
            return BytesRead;
        }

        public int Write(int BytesToWrite)
        {
            // Writes out the file in one swoop using the Windows WriteFile function.
            int NumberOfBytesWritten;
            if (!WriteFile(pHandle, pBuffer, BytesToWrite, &NumberOfBytesWritten, 0))
            {
                Win32Exception WE = new Win32Exception();
                ApplicationException AE = new ApplicationException("WinFileIO:Write - Error occurred writing a file. - " +
                    WE.Message);
                throw AE;
            }
            return NumberOfBytesWritten;
        }

        public int WriteBlocks(int NumBytesToWrite)
        {
            // This function writes out chunks at a time instead of the entire file.  This is the fastest write function,
            // perhaps because the block size is an even multiple of the sector size.
            int BytesWritten = 0, BytesToWrite, RemainingBytes, BytesOutput = 0;
            byte* pBuf = (byte*)pBuffer;
            RemainingBytes = NumBytesToWrite;
            // Do until there are no more bytes to write.
            do
            {
                BytesToWrite = Math.Min(RemainingBytes, BlockSize);
                if (!WriteFile(pHandle, pBuf, BytesToWrite, &BytesWritten, 0))
                {
                    // This is an error condition.  The error msg can be obtained by creating a Win32Exception and
                    // using the Message property to obtain a description of the error that was encountered.
                    Win32Exception WE = new Win32Exception();
                    ApplicationException AE = new ApplicationException("WinFileIO:WriteBlocks - Error occurred writing a file. - "
                        + WE.Message);
                    throw AE;
                }
                pBuf += BytesToWrite;
                BytesOutput += BytesToWrite;
                RemainingBytes -= BytesToWrite;
            } while (RemainingBytes > 0);
            return BytesOutput;
        }

        public bool Close()
        {
            // This function closes the file handle.
            bool Success = true;
            if (pHandle != IntPtr.Zero)
            {
                Success = CloseHandle(pHandle);
                pHandle = IntPtr.Zero;
            }
            return Success;
        }
    }
}

注意:我尝试将 WFIO.Dispose 移到 WFIO.Close 之后,但没有任何区别。同样,我尝试反复重用同一个 WFIO 对象。

4

1 回答 1

3

File.ReadAllBytes读取文件并返回一个与文件大小完全相同的数组。也就是说,如果文件长度为 94,864 字节,那么您的缓冲区将为byte[94864].

使用 Windows I/O 读取文件的代码希望您传入一个足够大的缓冲区来保存文件。您正在向它传递一个 128K 长的缓冲区。所以如果文件比那个小,最后会有一堆空白空间。

您不会将缓冲区长度传递给您的解析方法。

所以我最好的猜测是你的解析代码试图在缓冲区的末尾解析一堆垃圾,这就是需要这么长时间的原因。

于 2013-08-05T21:17:30.753 回答