如果它们的时间戳相差小于某个时间跨度,看起来将文件捆绑在一起就足够了。
因此,如果您按文件排序文件,.LastWriteTimeUtc
则可以遍历该列表并检查一个与前一个之间的时间。如果差距很小,则将其添加到当前列表中,否则开始一个新列表。
我在随机选择文件的目录上测试了以下代码,因此 30 天是一个合适的时间跨度,看起来两三秒可能适合您的使用:
Option Infer On
Option Strict On
Imports System.IO
Module Module1
''' <summary>
''' Get FileInfos bunched by virtue of having less than some time interval between their consecutive LastWriteTimeUtc when ordered by that.
''' </summary>
''' <param name="srcDir">Directory to get files from.</param>
''' <param name="adjacencyLimit">The allowable timespan to count as in the same bunch.</param>
''' <returns>A List(Of List(Of FileInfo). Each outer list has consecutive LastWriteTimeUtc differences less than some time interval.</returns>
Function GetTimeAdjacentFiles(srcDir As String, adjacencyLimit As TimeSpan) As List(Of List(Of FileInfo))
Dim di = New DirectoryInfo(srcDir)
Dim fis = di.GetFiles().OrderBy(Function(fi) fi.LastWriteTimeUtc)
If fis.Count = 0 Then
Return Nothing
End If
Dim bins As New List(Of List(Of FileInfo))
Dim thisBin As New List(Of FileInfo) From {(fis(0))}
For i = 1 To fis.Count - 1
If fis(i).LastWriteTimeUtc - fis(i - 1).LastWriteTimeUtc < adjacencyLimit Then
thisBin.Add(fis(i))
Else
bins.Add(thisBin)
thisBin = New List(Of FileInfo) From {fis(i)}
End If
Next
bins.Add(thisBin)
Return bins
End Function
Sub Main()
Dim src = "E:\temp"
'TODO: choose a suitable TimeSpan, e.g. TimeSpan.FromSeconds(3)
Dim adjacencyLimit = TimeSpan.FromDays(30)
Dim x = GetTimeAdjacentFiles(src, adjacencyLimit)
For Each b In x
Console.WriteLine("***********")
For Each fi In b
'TODO: merge each fi into a PDF.
Console.WriteLine(fi.Name)
Next
Next
Console.ReadLine()
End Sub
End Module
我建议两到三秒,因为如果文件存储在 FAT 类型(例如 FAT32 或 exFAT,可用于 USB 记忆棒、旧磁盘驱动器等)文件系统上,那么时间戳的分辨率将是两秒钟。