0

我已经广泛搜索了对此的 VBscript 答案,但已经放弃并需要帮助。

我想要完成的是找到具有不同文件名的明显重复的文件(无论如何对人类来说都是显而易见的)。我需要删除重复项,保留名称中没有曲目编号的重复项。如果 MP3 中已有 M4A 版本,我还需要删除它。

这甚至可能吗?我做了一点VBscripting,但这超出了我有限的编程能力。我什至不会费心在这里复制我尝试过的代码,因为它们都不起作用。

这是我要清理的示例文件夹。我只想保留这里的两首独特的歌曲。我只想要 MP3 版本,我不想要他们名字中的曲目编号。

07 坠入爱河 (难在 K.mp3
1-15 爱在电梯里.m4a 1-15 爱在
电梯里.mp3 15 爱在
电梯里.mp3
2-07 坠入爱河 (难在 The.m4a
2-07 坠入爱河(好难受).mp3 失恋(难受
膝盖难受).mp3
爱​​在电梯里.mp3

谢谢!

4

1 回答 1

0

这不是一个简单的任务,因为基本上你想要测量不同文件名的相似度/接近度。我的外行方法是从文件名中提取标题,对其进行规范化,然后使用最短的基于左的匹配来比较它们。像这样的东西可能会起作用:

Set fso = CreateObject("Scripting.FileSystemObject")

Set re = New RegExp
re.Pattern = "^\d+(-\d+)?\s+"

Set rs = CreateObject("ADOR.Recordset")
rs.Fields.Append "NormalizedName", 200, 255
rs.Fields.Append "Length", 3
rs.Fields.Append "Path", 200, 255
rs.Open

' Store the full paths of the files and their associated normalized name in
' a disconnected recordset. The "Length" field is used for sorting (see below).
For Each f In fso.GetFolder("C:\some\folder").Files
  normalizedName = LCase(re.Replace(fso.GetBaseName(f.Name), ""))
  rs.AddNew
  rs("NormalizedName").Value = normalizedName
  rs("Length").Value = Len(normalizedName)
  rs("Path").Value = f.Path
  rs.Update
Next

' sort to ensure that the shortest normalized name always comes first
rs.Sort = "NormalizedName, Length ASC"

ref = ""
Set keeplist = CreateObject("Scripting.Dictionary")

rs.MoveFirst
Do Until rs.EOF
  path = rs("Path").Value
  name = rs("NormalizedName").Value
  currentExtension = LCase(fso.GetExtensionName(path))
  If ref <> "" And ref = Left(name, Len(ref)) Then
    ' same title as last file, so check if this one is a better match
    If extension <> "mp3" And currentExtension = "mp3" Then
      ' always pick MP3 version if it exists
      keeplist(ref) = path
      extension = currentExtension
    ElseIf extension = currentExtension _
        And IsNumeric(Left(fso.GetBaseName(keeplist(ref)), 1)) _
        And Not IsNumeric(Left(fso.GetBaseName(path), 1)) Then
      ' prefer file names not starting with a number when they have the
      ' same extension
      keeplist(ref) = path
    End If
  Else
    ' first file or different reference name
    ref = name
    extension = currentExtension
    keeplist.Add ref, path
  End If
  rs.MoveNext
Loop
rs.Close

For Each ref In keeplist
  WScript.Echo keeplist(ref)
Next

我很确定上面的代码没有涵盖一些边缘情况,所以要小心处理。另请注意,代码仅处理一个文件夹。要处理文件夹树,需要额外的代码(例如,请参见此处)。

于 2013-06-02T13:00:55.853 回答