2

例如,我在 excel 中有带有文件结构的行。

Row 1 c:\User\Folder100\13-25\File100.log
Row 2 c:\User\Folder200\11-16\File200.log
Row 3 c:\User\Folder300\21-20\File300.log
Row 4 c:\User\Folder400\13-25\File400.log
Row 5 c:\User\Folder400\9-10\File400.log
Row 6 c:\User\Folder500\8-16\File500.log
Row 7 c:\User\Folder600\8-16\File600.log
Row 8 c:\User\Folder700\11-16\File700.log
Row 9 c:\User\Folder700\9-40\File700.log

第一行没有任何问题,因为文件日志不同,但行(4 和 5) a 在两个不同的文件夹“c:\User\Folder400\13-25\”和 c 中有相同的日志:\User\Folder400\9-10\ 我想只保留 13-25(消除第 5 行),因为时间更近。

还有第 8 行和第 9 行,我只想保留第 8 行 (11-16)

Row 1 c:\User\Folder100\13-25\File100.log
Row 2 c:\User\Folder200\11-16\File200.log
Row 3 c:\User\Folder300\21-20\File300.log
Row 4 c:\User\Folder400\13-25\File400.log
Row 6 c:\User\Folder500\8-16\File500.log
Row 7 c:\User\Folder600\8-16\File600.log
Row 8 c:\User\Folder700\11-16\File700.log

(删除第 5 行和第 9 行)

你知道如何在 VBA 中制作它吗?

4

2 回答 2

3

下面的代码

  1. 使用 aRegEx将文件夹名称和文件编号提取到两个新列中(见下图)
  2. 按 B 列排序,然后按 C 列降序排序
  3. 使用 Excels 功能删除 B 列中存在重复项的整行Remove Duplicates(最新时间在 CV 列中排在首位,因此被保留)
  4. 删除两个工作列

更新:下面的代码假定“用户”之后的第一个文件夹文件名都非常匹配,因为它是重复的 - 最初的指导方针仍然模棱两可。此代码确实解决了问题中显示的示例

在此处输入图像描述

Sub Sliced()
    Dim lngRow As Long
    Dim lngCalc As Long
    Dim objReg As Object
    Dim objDic As Object
    Dim rng1 As Range
    Dim X()
    Dim Y()

    Set rng1 = Range([a1], Cells(Rows.Count, "A").End(xlUp))

    'See Patrick Matthews excellent article on using Regular Expressions with VBA
    Set objReg = CreateObject("vbscript.regexp")
    objReg.Pattern = "(.+\\){2}(.+\\)(\d+)\-\d+\\(.+)"

    'Speed up the code by turning off screenupdating and setting calculation to manual
    'Disable any code events that may occur when writing to cells
    With Application
        lngCalc = .Calculation
        .ScreenUpdating = False
        .Calculation = xlCalculationManual
        .EnableEvents = False
    End With

    'Test each area in the user selected range
    X = rng1.Value2
    Y = X
    For lngRow = 1 To UBound(X)
        'replace the leading zeroes
        X(lngRow, 1) = objReg.Replace(X(lngRow, 1), "$2$4")
        Y(lngRow, 1) = objReg.Replace(Y(lngRow, 1), "$3")
    Next

    Columns("B:C").Insert
    rng1.Offset(0, 1) = X
    rng1.Offset(0, 2) = Y

    With ActiveSheet.Sort
        .SortFields.Clear
        .SortFields.Add Key:=rng1.Offset(0, 1), _
                        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
        .SortFields.Add Key:=rng1.Offset(0, 2), _
                        SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
        .SetRange rng1.Cells(1).Offset(0, 1).Resize(rng1.Rows.Count, 2)
        .Header = xlGuess
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    ActiveSheet.UsedRange.RemoveDuplicates Columns:=2, Header:=xlNo
    Columns("B:C").Delete

    'cleanup the Application settings
    With Application
        .ScreenUpdating = True
        .Calculation = lngCalc
        .EnableEvents = True
    End With

    Set objReg = Nothing
End Sub
于 2012-11-19T11:26:04.250 回答
2

这并不完全符合目的,而是说明了一种解决此类问题的方法。

它只考虑文件名和它前面的时间字符串。如果需要,可以添加该文件夹。

主要模块:

Option Explicit
Private dict As dictionary

'Prints the rows you need (time criterion applied) 
Private Sub FindDuplicates()
    Dim lastRow As Long, row As Long
    Dim x As Variant, v As Variant
    Dim fileName As String, timeString As String

    Set dict = CreateObject("Scripting.Dictionary")

    'Determine last row
    lastRow = Worksheets(1).Cells(Rows.Count, 1).End(xlUp).row

  'Iterate and store in dictionary  
  For row = 1 To lastRow
        x = Split(Cells(row, 1), Application.PathSeparator)
        fileName = x(UBound(x))
        timeString = x(UBound(x) - 1)
        AddDictEntry fileName, row, timeString
    Next row

    'Print results
    For Each v In dict.Keys
        Debug.Print "FileName: " & v & ", Recent Version: " & dict.Item(v)
    Next
End Sub

添加/删除字典条目:

Private Sub AddDictEntry(fileName As String, rowNo As Long, timeString As String)
    Dim timeParts As Variant, timeLong As Long

   'converts time string to long, for comparison
    timeParts = Split(timeString, "-")
    timeLong = CInt(timeParts(0)) * 100 + CInt(timeParts(1))

    'Adds entry to dictionary if time is more recent
    If (dict.Exists(fileName)) Then
        If CInt(dict.Item(fileName)) < timeLong Then
            dict(fileName) = timeLong
        End If
    Else
        dict.Add fileName, timeLong
    End If

End Sub

输入:

c:\User\Folder100\13-25\File100.log
c:\User\Folder200\11-16\File200.log
c:\User\Folder300\21-20\File300.log
c:\User\Folder400\13-25\File400.log
c:\User\Folder400\9-10\File400.log
c:\User\Folder300\22-20\File300.log

输出:

FileName: File100.log, Recent Version: 1325
FileName: File200.log, Recent Version: 1116
FileName: File300.log, Recent Version: 2220
FileName: File400.log, Recent Version: 1325
于 2012-11-19T11:10:01.837 回答