visual-studio-2010 - 从文件中读取 xml 时出现内存不足异常

Question

我正在使用 twitter api 数据，在将流结果存储在文本文件中之后，我将数据输入到解析器应用程序中。我计划的是大型数据文件，所以我阅读内容时使用分隔符]} 分隔各个帖子以避免潜在的错误？备份功能是使用缓冲区读取数据，然后剪切到各个帖子中。但问题是，在某些情况下，对于单个帖子，会出现内存异常。现在，当我查看单个帖子时，它似乎不一定很大，但文本将包含外来字符或某些编码，我猜这会导致内存异常。我还没有弄清楚是否正是这样，但我想我会从这里得到一些意见或建议......

        myreader.TextFieldType = FileIO.FieldType.Delimited
        myreader.SetDelimiters("]}}")
        Dim currentRow As String()

        Try

            While Not myreader.EndOfData
                Try
                    currentRow = myreader.ReadFields()
                    Dim currentField As String

                    For Each currentField In currentRow
                        data = data + currentField
                        counter += 1
                        If counter = 1000 Then
                            Dim pt As New parsingUtilities
                            If Not data = "" Then
                                pt.getNodes(data)
                                counter = 0
                            End If
                        End If
                    Next
                Catch ex As Exception
                    If ex.Message.Contains("MemoryException") Then
                        fileBKup()
                    End If
                End Try

另一次发生内存异常时，我尝试拆分为不同的帖子：

    Dim sampleResults() As String
    Dim stringSplitter() As String = {"}}"}

    ' split the file content based on the closing entry tag
    sampleResults = Nothing
    Try
        sampleResults = post.Split(stringSplitter, StringSplitOptions.RemoveEmptyEntries)

    Catch ex As Exception
        appLogs.constructLog(ex.Message.ToString, True, True)
        moveErrorFiles(form1.infile)
        Exit Sub
    End Try

score 1 · Accepted Answer

我预计问题是字符串。

字符串是不可变的，这意味着每次您认为您正在通过这样做来更改字符串

data = data + currentField

您实际上是在内存中创建另一个新字符串。因此，如果您这样做数千次，它可能会导致问题，因为它们会安装并且您会收到 OutOfMemoryException。

如果您正在构建字符串，则应该使用 StringBuilder 。

visual-studio-2010 - 从文件中读取 xml 时出现内存不足异常

1 回答 1

Related

Reference