这不是一个简单的任务,因为基本上你想要测量不同文件名的相似度/接近度。我的外行方法是从文件名中提取标题,对其进行规范化,然后使用最短的基于左的匹配来比较它们。像这样的东西可能会起作用:
Set fso = CreateObject("Scripting.FileSystemObject")
Set re = New RegExp
re.Pattern = "^\d+(-\d+)?\s+"
Set rs = CreateObject("ADOR.Recordset")
rs.Fields.Append "NormalizedName", 200, 255
rs.Fields.Append "Length", 3
rs.Fields.Append "Path", 200, 255
rs.Open
' Store the full paths of the files and their associated normalized name in
' a disconnected recordset. The "Length" field is used for sorting (see below).
For Each f In fso.GetFolder("C:\some\folder").Files
normalizedName = LCase(re.Replace(fso.GetBaseName(f.Name), ""))
rs.AddNew
rs("NormalizedName").Value = normalizedName
rs("Length").Value = Len(normalizedName)
rs("Path").Value = f.Path
rs.Update
Next
' sort to ensure that the shortest normalized name always comes first
rs.Sort = "NormalizedName, Length ASC"
ref = ""
Set keeplist = CreateObject("Scripting.Dictionary")
rs.MoveFirst
Do Until rs.EOF
path = rs("Path").Value
name = rs("NormalizedName").Value
currentExtension = LCase(fso.GetExtensionName(path))
If ref <> "" And ref = Left(name, Len(ref)) Then
' same title as last file, so check if this one is a better match
If extension <> "mp3" And currentExtension = "mp3" Then
' always pick MP3 version if it exists
keeplist(ref) = path
extension = currentExtension
ElseIf extension = currentExtension _
And IsNumeric(Left(fso.GetBaseName(keeplist(ref)), 1)) _
And Not IsNumeric(Left(fso.GetBaseName(path), 1)) Then
' prefer file names not starting with a number when they have the
' same extension
keeplist(ref) = path
End If
Else
' first file or different reference name
ref = name
extension = currentExtension
keeplist.Add ref, path
End If
rs.MoveNext
Loop
rs.Close
For Each ref In keeplist
WScript.Echo keeplist(ref)
Next
我很确定上面的代码没有涵盖一些边缘情况,所以要小心处理。另请注意,代码仅处理一个文件夹。要处理文件夹树,需要额外的代码(例如,请参见此处)。