2

目前我有一个贯穿列表并删除重复值(在一列中)的宏,但事实证明它的效率非常低。对于它检查重复的每个条目,它必须遍历整个列;我的文件目前有 50,000 个条目,这不是一项小任务。

我认为宏工作的一种更简单的方法是让宏检查这个值是否在数组中。如果是,则删除条目所在的行。如果不是,则将值添加到数组中。

有人可以为宏的基本轮廓提供一些帮助吗?谢谢

4

4 回答 4

3

下面的代码将遍历您的源数据并将其存储在一个数组中,同时检查重复项。收集完成后,它使用数组作为键来知道要删除哪些列。

由于删除后大量的电位屏幕更新,请务必关闭屏幕更新。(包括)

Sub Example()
    Application.ScreenUpdating = false
    Dim i As Long
    Dim k As Long
    Dim StorageArray() As String
    Dim iLastRow As Long
    iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    ReDim StorageArray(1 To iLastRow, 0 To 1)

    'loop through column from row 1 to the last row
    For i = 1 To iLastRow
        'add each sheet value to the first column of the array
        StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
        '- keep the second column as 0 by default
        StorageArray(i, 1) = 0
        '- as each item is added, loop through previously added items to see if its a duplicate
        For k = 1 To i-1
            If StorageArray(k, 0) = StorageArray(i, 0) Then
                'if it is a duplicate set the second column of the srray to 1
                StorageArray(i, 1) = 1
                Exit For
            End If
        Next k
    Next i

    'loop through sheet backwords and delete rows that were maked for deletion
    For i = iLastRow To 1 Step -1
        If StorageArray(i, 1) = 1 Then
            ActiveSheet.Range("A" & i).EntireRow.Delete
        End If
    Next i

    Application.ScreenUpdating = true
End Sub

根据要求,这是一种类似的方法,使用集合而不是数组进行键索引:(RBarryYoung)

Public Sub RemovecolumnDuplicates()
    Dim prev as Boolean
    prev = Application.ScreenUpdating
    Application.ScreenUpdating = false
    Dim i As Long, k As Long

    Dim v as Variant, sv as String
    Dim cl as Range, ws As Worksheet
    Set ws = ActiveWorksheet    'NOTE: This really should be a parameter ...

    Dim StorageArray As New Collection
    Dim iLastRow As Long
    iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    'loop through column from row 1 to the last row
    i = 1
    For k = 1 To iLastRow
        'add each sheet value to the collection
        Set cl = ws.Cells(i, 1)
        v = cl.Value
        sv = Cstr(v)

        On Error Resume Next
            StorageArray.Add v, sv
        If Err.Number <> 0 Then
            'must be a duplicate, remove it
            cl.EntireRow.Delete
            'Note: our index doesn't change here, since all of the rows moved
        Else
            'not a duplicate, so go to the next row
            i = i + 1
        End If
    Next k

    Application.ScreenUpdating = prev
End Sub

请注意,此方法不需要为列中单元格的值假定任何数据类型或整数限制。

(Mea Culpa:我不得不在记事本中手动输入,因为我的 Excel 现在正忙于运行项目测试。所以可能存在一些拼写/语法错误......)

于 2012-07-12T17:20:53.477 回答
1

这是我的评论的后续行动。对于这样一个简单的操作,循环 50k 记录+循环数组将是一个过度杀戮。

就像我在评论中提到的那样,将数组中的值复制到新工作表中。然后在 50k 条目旁边插入一个空白列并执行Vlookupor CountIf。完成后,执行自动筛选,然后一次性删除重复的条目。让我们举个例子,看看它是如何工作的。

假设我们有一个包含 1000 个项目的数组?在一张纸中,我们有 50k 个数据。下面的代码将被测试1000 items in Array50k Data查看快照

在此处输入图像描述

将此代码粘贴到模块中(代码完成时间少于 5 秒

在此处输入图像描述

Sub Sample()
    Dim ws As Worksheet, wstemp As Worksheet
    Dim LRow As Long
    Dim Ar(1 To 1000) As Long
    Dim startTime As String, EndTime As String

    startTime = Format(Now, "hh:mm:ss")

    Set ws = Sheets("Sheet1")
    Set wstemp = Sheets.Add

    '~~> Creating a dummy array
    For i = 1 To 1000
        Ar(i) = i
    Next i

    '~~> Copy it to the new sheet
    wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)

    With ws
        LRow = .Range("A" & .Rows.Count).End(xlUp).Row

        .Columns(2).Insert Shift:=xlToRight
        .Range("B1").Value = "For Deletion"
        .Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
        .Columns(2).Value = .Columns(2).Value

        '~~> Remove any filters
        .AutoFilterMode = False

        '~~> Filter, offset(to exclude headers) and delete visible rows
        With .Range("B1:B" & LRow)
            .AutoFilter Field:=1, Criteria1:="<>0"
            .Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With

        '~~> Remove any filters
        .AutoFilterMode = False

        .Columns(2).Delete
    End With

    EndTime = Format(Now, "hh:mm:ss")

    MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub
于 2012-07-12T18:01:12.583 回答
1

对于 Excel 2007 及更高版本:将数组复制到工作表并使用 removeduplicates 方法:

set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no

这假设您的数组的下限是 1,您要删除重复的列是第 1 列,并且您的列表没有标题。然后,您可以找到新范围的边界并将其读回您的数组(首先擦除当前数组)。

于 2012-07-13T20:48:13.467 回答
0

我建议填充您的列,然后使用公式查找重复项并删除它们。我没有你的实际代码(你没有给我们任何代码)

dim a as range
dim b as range
set a = Range ("A1")

Do while Not isEmpty(A)
Set b = a.offset(1,0)

If b = a then
b= ""
else a.offset (1,0)

Loop

我相信您可以在运行宏之前将过滤器放入代码中,或者只是记住过滤器。

于 2012-07-12T17:18:58.433 回答