.net - 源中每个项目的 IEnumerable 和引发事件的性能比较？

Question

我想读取包含数百万条记录的大二进制文件，并且我想获取一些记录报告。我BinaryReader用来读取（我认为在读取器中具有最佳性能）并将读取的字节转换为数据模型。由于记录的数量，将模型传递给报表层是另一个问题：我更喜欢IEnumerable在开发报表时使用 LINQ 功能和特性。

这是示例数据类：

Public Class MyData
    Public A1 As UInt64
    Public A2 As UInt64
    Public A3 As Byte
    Public A4 As UInt16
    Public A5 As UInt64
End Class

我用这个子来创建文件：

Sub CreateSampleFile()
    Using streamWriter As New FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.Write)
        For i As Integer = 1 To 1000
            For j As Integer = 1 To 1000
                For k = 1 To 30
                    Dim item As New MyData With {.A1 = i, .A2 = j, .A3 = k, .A4 = j, .A5 = i * j}
                    Dim bytes() As Byte = BitConverter.GetBytes(item.A1).Concat(BitConverter.GetBytes(item.A2)).Concat({item.A3}).Concat(BitConverter.GetBytes(item.A4)).Concat(BitConverter.GetBytes(item.A5)).ToArray
                    streamWriter.Write(bytes, 0, bytes.Length)
                Next
            Next
        Next
    End Using
End Sub

这是我的读者课：

Imports System.IO

Public Class FileReader

    Public Const BUFFER_LENGTH As Long = 4096 * 256 * 27
    Public Const MY_DATA_LENGTH As Long = 27
    Private _buffer(BUFFER_LENGTH - 1) As Byte
    Private _streamWriter As FileStream
    Public Event OnByteRead(sender As FileReader, bytes() As Byte, index As Long)

    Public Sub StartReadBinary(fileName As String)
        Dim currentBufferReadCount As Long = 0
        Using fileStream As New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read)
            Using streamReader As New BinaryReader(fileStream)
                currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                While currentBufferReadCount > 0
                    For i As Integer = 0 To currentBufferReadCount - 1 Step MY_DATA_LENGTH
                        RaiseEvent OnByteRead(Me, Me._buffer, i)
                    Next
                    currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                End While
            End Using
        End Using
    End Sub

    Public Iterator Function GetAll(fileName As String) As IEnumerable(Of MyData)
        Dim currentBufferReadCount As Long = 0
        Using fileStream As New FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read)
            Using streamReader As New BinaryReader(fileStream)
                currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                While currentBufferReadCount > 0
                    For i As Integer = 0 To currentBufferReadCount - 1 Step MY_DATA_LENGTH
                        Yield GetInstance(_buffer, i)
                    Next
                    currentBufferReadCount = streamReader.Read(Me._buffer, 0, Me._buffer.Length)
                End While
            End Using
        End Using
    End Function

    Public Function GetInstance(bytes() As Byte, index As Long) As MyData
        Return New MyData With {.A1 = BitConverter.ToUInt64(bytes, index), .A2 = BitConverter.ToUInt64(bytes, index + 8), .A3 = bytes(index + 16), .A4 = BitConverter.ToUInt16(bytes, index + 17), .A5 = BitConverter.ToUInt64(bytes, index + 19)}
    End Function

End Class

我在考虑IEnumerable性能，所以我尝试对从文件中读取的每条记录同时使用GetAll方法IEnumerable和引发事件。这是测试模块：

Imports System.IO

Module Module1

    Private fileName As String = "MyData.dat"
    Private readerJustTraverse As New FileReader
    Private WithEvents readerWithoutInstance As New FileReader
    Private WithEvents readerWithInstance As New FileReader
    Private readerIEnumerable As New FileReader

    Sub Main()

        Dim s As New Stopwatch

        s.Start()
        readerJustTraverse.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes: {0}", s.ElapsedMilliseconds)

        s.Restart()
        readerWithoutInstance.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes, raise event: {0}", s.ElapsedMilliseconds)

        s.Restart()
        readerWithInstance.StartReadBinary(fileName)
        s.Stop()
        Console.WriteLine("Read bytes, raise event, get instance: {0}", s.ElapsedMilliseconds)

        s.Restart()
        For Each item In readerIenumerable.GetAll(fileName)

        Next
        Console.WriteLine("Read bytes, get instance, return yield: {0}", s.ElapsedMilliseconds)
        s.Stop()

        Console.ReadLine()

    End Sub

    Private Sub readerWithInstance_OnByteRead(sender As FileReader, bytes() As Byte, index As Long) Handles readerWithInstance.OnByteRead
        Dim item As MyData = sender.GetInstance(bytes, index)
    End Sub

    Private Sub readerWithoutInstance_OnByteRead(sender As FileReader, bytes() As Byte, index As Long) Handles readerWithoutInstance.OnByteRead
        'do nothing
    End Sub

End Module

我想知道的是每个进程的经过时间，这是测试结果（在华硕超极本 - Zenbook Core i7 上测试）：

读取字节：384（不触及读取字节！）

读取字节，引发事件：583

读取字节，引发事件，获取实例：3923

读取字节，获取实例，返回产量：4917

它表明以字节形式读取文件非常快，而将字节转换为模型很慢。同样引发事件而不是获得 IEnumerable 结果，速度提高了 25%。

在 IEnumerable 中进行迭代真的有这种性能成本还是我错过了什么？

score 2 · Accepted Answer

是的，使用迭代器函数会带来性能损失。

我编译了你的代码，得到了和你一样的结果。我查看了生成的 IL 代码。从 GetAll 方法创建的状态机确实包含很多东西，但大多数指令是 nop 或简单操作。

正如您所说，使用/不使用迭代器函数的结果相差 25%。这还不算太多。当您使用 StartReadBinary 时，只需一个大循环即可（通过事件）调用 OnByteRead 方法 30 亿次。但是，当您在 foreach 循环中创建对象时，您必须为每个对象调用生成的枚举器的 GetCurrent() 方法和 MoveNext()，后者并非微不足道（GetAll 中的大部分代码是移到那里）并使用大量编译器生成的变量。

使用“Yield”通常会减慢您的程序，因为编译器必须创建复杂的 IL 代码来表示状态机。

.net - 源中每个项目的 IEnumerable 和引发事件的性能比较？

1 回答 1

Related

Reference