我正在使用 twitter api 数据,在将流结果存储在文本文件中之后,我将数据输入到解析器应用程序中。我计划的是大型数据文件,所以我阅读内容时使用分隔符]} 分隔各个帖子以避免潜在的错误?备份功能是使用缓冲区读取数据,然后剪切到各个帖子中。但问题是,在某些情况下,对于单个帖子,会出现内存异常。现在,当我查看单个帖子时,它似乎不一定很大,但文本将包含外来字符或某些编码,我猜这会导致内存异常。我还没有弄清楚是否正是这样,但我想我会从这里得到一些意见或建议......
myreader.TextFieldType = FileIO.FieldType.Delimited
myreader.SetDelimiters("]}}")
Dim currentRow As String()
Try
While Not myreader.EndOfData
Try
currentRow = myreader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
data = data + currentField
counter += 1
If counter = 1000 Then
Dim pt As New parsingUtilities
If Not data = "" Then
pt.getNodes(data)
counter = 0
End If
End If
Next
Catch ex As Exception
If ex.Message.Contains("MemoryException") Then
fileBKup()
End If
End Try
另一次发生内存异常时,我尝试拆分为不同的帖子:
Dim sampleResults() As String
Dim stringSplitter() As String = {"}}"}
' split the file content based on the closing entry tag
sampleResults = Nothing
Try
sampleResults = post.Split(stringSplitter, StringSplitOptions.RemoveEmptyEntries)
Catch ex As Exception
appLogs.constructLog(ex.Message.ToString, True, True)
moveErrorFiles(form1.infile)
Exit Sub
End Try