0

我目前有一个包含 100 万个字符的文件。文件大小为 1 MB。我正在尝试使用这个仍然有效但速度很慢的旧函数解析数据。

start0end
start1end
start2end
start3end
start4end
start5end
start6end

该代码需要大约 5 分钟的痛苦时间来处理整个数据。任何指针和建议表示赞赏。

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim sFinal = ""
    Dim strData = textbox.Text
    Dim strFirst = "start"
    Dim strSec = "end"

    Dim strID As String, Pos1 As Long, Pos2 As Long, strCur As String = ""

    Do While InStr(strData, strFirst) > 0
        Pos1 = InStr(strData, strFirst)
        strID = Mid(strData, Pos1 + Len(strFirst))
        Pos2 = InStr(strID, strSec)

        If Pos2 > 0 Then
            strID = Microsoft.VisualBasic.Left(strID, Pos2 - 1)
        End If

        If strID <> strCur Then
            strCur = strID

            sFinal += strID & ","
        End If

        strData = Mid(strData, Pos1 + Len(strFirst) + 3 + Len(strID))
    Loop
End Sub
4

1 回答 1

2

这么慢的原因是因为你不断地破坏和重新创建一个 1 MB 的字符串。字符串是不可变的,因此创建一个新字符串并将剩余的 1 MB 字符串数据一遍又一遍地strData = Mid(strData...复制到一个新变量中。strData有趣的是,甚至 VB6 也允许使用渐进式索引。

我会逐行处理磁盘文件并在读取信息时提取信息(请参阅streamreader.ReadLine参考资料)以避免使用 1MB 的字符串。在那里可以使用几乎相同的方法。

' 1 MB textbox data (!?)
Dim sData As String = TextBox1.Text
' start/stop - probably fake
Dim sStart As String = "start"
Dim sStop As String = "end"

' result
Dim sbResult As New StringBuilder
' progressive index
Dim nNDX As Integer = 0

' shortcut at least as far as typing and readability
Dim MagicNumber As Integer = sStart.Length
' NEXT index of start/stop after nNDX
Dim i As Integer = 0
Dim j As Integer = 0

' loop as long as string remains 
 Do While (nNDX < sData.Length) AndAlso (i >= 0)
    i = sData.IndexOf(sStart, nNDX)             ' start index
    j = sData.IndexOf(sStop, i)                 ' stop index

    ' Extract and append bracketed substring 
    sbResult.Append(sData.Substring(i + MagicNumber, j - (i + MagicNumber)))
    ' add a cute comma
    sbResult.Append(",")

    nNDX = j                               ' where we start next time
    i = sData.IndexOf(sStart, nNDX)
 Loop

 ' remove last comma
 sbResult.Remove(sbResult.ToString.Length - 1, 1)

 ' show my work
 Console.WriteLine(sbResult.ToString)

EDIT: Small mod for the ad hoc test data

于 2013-10-18T22:55:38.490 回答