0

我正在编写一个 VB.NET 网络表单站点,其中一个页面必须将文件列表加载到列表框中。它需要将所有 PDF 和 TIF 文件加载到数据库中没有条目的目录中。我现在使用以下代码成功地做到了这一点。基本上,我查询数据库以获取文件名条目的数组列表,然后遍历目录中的每个文件,根据数组列表中的每个条目检查其名称,如果其名称不在数组列表中,则将其添加到列表以绑定到列表框:

    Dim category As String = "RFQ"

    'Initialize database connection variables
    Dim sql As String
    Dim query As System.Data.SqlClient.SqlCommand
    Dim result As System.Data.SqlClient.SqlDataReader

    'Load document list from database
    Dim savedfiles As New ArrayList
    database.Open() 'Open connection to database  
    sql = "SELECT filename FROM fileheaders WHERE [category] = '" & category & "'" 'SQL query to read file header information
    query = New System.Data.SqlClient.SqlCommand(sql, database) 'Create query to send to database
    result = query.ExecuteReader() 'Execute query
    While result.Read()
        savedfiles.Add(row(result, "filename"))
    End While
    result.Close()
    dbDocscan.Close()


    'The following code section pulls all files from the current file directory.
    Dim filelist = New ArrayList
    Dim dir As New System.IO.DirectoryInfo(dirName) 'Get directory information
    Dim files As System.IO.FileInfo() = dir.GetFiles() 'Get all files in directory
    Dim file As System.IO.FileInfo
    Dim i As Integer = 0
    For Each file In files
        If ((file.Extension Like ".pdf") Or (file.Extension Like ".tif")) And Not inArray(savedfiles, file.Name) Then
            filelist.Add(file.Name) 'Add .pdf and .tif files to list of documents
        End If
    Next

    filelist.TrimToSize()
    eltFilelist.DataSource = filelist
    eltFilelist.DataBind() 'Bind document list to listbox

然后是inArray函数代码:

Function inArray(arr As ArrayList, str As String) As Boolean
    For Each item In arr
        If TypeOf (item) Is String Then
            If str = item Then
                Return True
                Exit Function
            End If
        End If
    Next
    Return False
End Function

这就是问题所在:虽然它有效,但它似乎非常低效。目录中有大约 27,000 个文件,数据库中有大约 26,000 个文件条目。因此,我将根据 26,000 个名称的列表检查 27,000 个文件名中的每一个。如果不把它变成一个组合问题,那就是数以亿计的字符串匹配语句。有没有更有效的方法来解决这个问题?

4

2 回答 2

0

不要使用 ArrayList,而是使用 Dictionary 或 HashTable 来保存查询中的文件名。

您的 inArray 函数正在对找到的每个文件进行 O(n) 表扫描,这非常慢。

Dictionaries 和 HashTables 都有一个 Contains 成员,它会以更快的速度搜索您的文件名。

于 2013-08-27T20:13:15.980 回答
0

您可以使用 SQL 参数来避免类别字符串出现问题(例如,如果它有一个撇号,则连接的查询字符串会被破坏),只获取目录中具有您感兴趣的扩展名的文件,然后您可以使用 LINQ 以一种简单的方式获取丢失的文件:

Imports System.Data.SqlClient
Imports System.IO
Module Module1
    Function GetMissingFiles(sourceDirectory As String, category As String) As List(Of String)
        Dim missingFiles As New List(Of String)

        Dim filesInDatabase As New List(Of String)

        ' Query the database for the files in the given category'
        Using conn As New SqlConnection("connection string here")
            conn.Open()
            Dim sqlCmd As String = "SELECT filename FROM fileheaders WHERE [category] = @category"
            Dim query As New System.Data.SqlClient.SqlCommand(sqlCmd, conn)
            'TODO: change .SqlDbType to what it is in the database.'
            query.Parameters.Add(New SqlParameter With {.ParameterName = "@category", .SqlDbType = SqlDbType.NVarChar, .Value = category})

            Dim rdr As SqlDataReader = query.ExecuteReader()

            While rdr.Read()
                filesInDatabase.Add(rdr.GetString(0))
            End While

            conn.Close()

        End Using

        'TODO: it could be that filesInDatabase.Count = 0 is valid. Adjust if required.'
        If filesInDatabase.Count > 0 Then
            ' Get the existing files from the given directory.

            ' the extensions we are going to consider
            Dim extensions() As String = {"pdf", "tif"}

            Dim existingFiles As New List(Of String)

            ' get all the filenames (without the path) to consider'
            For Each extn In extensions
                existingFiles.AddRange(Directory.GetFiles(sourceDirectory, "*." & extn).ToList().Select(Function(p) Path.GetFileName(p)))
            Next

            missingFiles = existingFiles.Except(filesInDatabase).ToList()

        End If

        Return missingFiles

    End Function
    Sub Whatever()
        Dim myMissingFiles As List(Of String)
        Try
            myMissingFiles = GetMissingFiles("C:\temp", "RFQ")
        Catch ex As Exception
            ' Inform user it went wrong.'
        End Try

        If myMissingFiles IsNot Nothing AndAlso myMissingFiles.Count > 0 Then
            eltFilelist.DataSource = myMissingFiles
            eltFilelist.DataBind()
        End If

    End Sub

End Module
于 2013-08-27T21:35:45.720 回答