excel - 使用 VBA 从数组中删除重复项

Question

假设我在 Excel 2010 中有一个数据块，100 行乘 3 列。

C列包含一些重复项，假设它开始为

1、1、1、2、3、4、5、……、97、98

使用 VBA，我想删除重复的行，所以我剩下 98 行和 3 列。

1、2、3、……、97、98

我知道 Excel 2010 中有一个按钮可以执行此操作，但它随后会干扰我的其余代码并给出不正确的结果。

此外，我想在数组中进行，然后将结果粘贴到工作表上，而不是诸如Application.Worksheetfunction.countif(.....

所以像：

Dim myarray() as Variant
myarray=cells(1,1).Currentregion.value

Dim a as Long

For a=1 to Ubound(myarray,1)

    'something here to 

Next a

score 8 · Accepted Answer

Function eliminateDuplicate(poArr As Variant) As Variant
    Dim poArrNoDup()

    dupArrIndex = -1
    For i = LBound(poArr) To UBound(poArr)
        dupBool = False

        For j = LBound(poArr) To i
            If poArr(i) = poArr(j) And Not i = j Then
                dupBool = True
            End If
        Next j

        If dupBool = False Then
            dupArrIndex = dupArrIndex + 1
            ReDim Preserve poArrNoDup(dupArrIndex)
            poArrNoDup(dupArrIndex) = poArr(i)
        End If
    Next i

    eliminateDuplicate = poArrNoDup
End Function

score 8 · Accepted Answer

我回答了一个类似的问题。这是我使用的代码：

Dim dict As Object
Dim rowCount As Long
Dim strVal As String

Set dict = CreateObject("Scripting.Dictionary")

rowCount = Sheet1.Range("A1").CurrentRegion.Rows.Count

'you can change the loop condition to iterate through the array rows instead
Do While rowCount > 1
  strVal = Sheet1.Cells(rowCount, 1).Value2

  If dict.exists(strVal) Then
    Sheet1.Rows(rowCount).EntireRow.Delete
  Else
    'if doing this with an array, then add code in the Else block
    ' to assign values from this row to the array of unique values
    dict.Add strVal, 0
  End If

  rowCount = rowCount - 1
Loop

Set dict = Nothing

如果要使用数组，则使用相同的条件 (if/else) 语句遍历元素。如果字典中不存在该项目，则可以将其添加到字典并将行值添加到另一个数组。

老实说，我认为最有效的方法是调整从宏记录器获得的代码。您可以在一行中执行上述功能：

    Sheet1.UsedRange.RemoveDuplicates Columns:=3, Header:=xlYes

score 4 · Accepted Answer

@RBILLC 和 @radoslav006 答案的改进，此版本在删除重复项的数组中搜索现有值，因此它搜索较少的值来查找重复项。

Public Function RemoveDuplicatesFromArray(sourceArray As Variant)
    Dim duplicateFound As Boolean
    Dim arrayIndex As Integer, i As Integer, j As Integer
    Dim deduplicatedArray() As Variant
    
    arrayIndex = -1
    deduplicatedArray = Array(1)

    For i = LBound(sourceArray) To UBound(sourceArray)
        duplicateFound = False

        For j = LBound(deduplicatedArray) To UBound(deduplicatedArray)
            If sourceArray(i) = deduplicatedArray(j) Then
                duplicateFound = True
                Exit For
            End If
        Next j

        If duplicateFound = False Then
            arrayIndex = arrayIndex + 1
            ReDim Preserve deduplicatedArray(arrayIndex)
            deduplicatedArray(arrayIndex) = sourceArray(i)
        End If
    Next i

    RemoveDuplicatesFromArray = deduplicatedArray
End Function

score 2 · Accepted Answer

从一维数组中删除重复项的简单函数

Private Function DeDupeArray(vArray As Variant) As Variant
  Dim oDict As Object, i As Long
  Set oDict = CreateObject("Scripting.Dictionary")
  For i = LBound(vArray) To UBound(vArray)
    oDict(vArray(i)) = True
  Next
  DeDupeArray = oDict.keys()
End Function

编辑：

使用stdVBA（主要由我自己维护的库），您可以使用：

uniqueValues = stdEnumerator.CreateFromArray(myArray).Unique().AsArray()

score 2 · Accepted Answer

这是使用数组的另一种方法：

Sub tester()

    Dim arr, arrout
    
    arr = Range("A1").CurrentRegion.Value   'collect the input array
     
    arrout = UniqueRows(arr)                'get only unique rows
    
    Range("H1").Resize(UBound(arrout, 1), UBound(arrout, 2)).Value = arrout
    
End Sub




Function UniqueRows(arrIn As Variant) As Variant
    Dim keys, rw As Long, col As Long, k, sep, arrout
    Dim dict As Object, lbr As Long, lbc As Long, ubr As Long, ubc As Long, rwOut As Long
    Set dict = CreateObject("scripting.dictionary")
    'input array bounds
    lbr = LBound(arrIn, 1)
    ubr = UBound(arrIn, 1)
    lbc = LBound(arrIn, 2)
    ubc = UBound(arrIn, 2)
    ReDim keys(lbr To ubr)
    'First pass:collect all the row "keys" in an array 
    '    and unique keys in a dictionary
    For rw = lbr To ubr
        k = "": sep = ""
        For col = lbc To ubc
            k = k & sep & arrIn(rw, col)
            sep = Chr(0)
        Next col
        keys(rw) = k     'collect key for this row
        dict(k) = True   'just collecting unique keys
    Next rw

    'Resize output array to # of unique rows
    ReDim arrout(lbr To dict.Count + (lbr - 1), lbc To ubc)
    rwOut = lbr
    'Second pass: copy each unique row to the output array
    For rw = lbr To ubr
        If dict(keys(rw)) Then      'not yet output?
            For col = lbc To ubc    'copying this row over to output...
                arrout(rwOut, col) = arrIn(rw, col)
            Next col
            rwOut = rwOut + 1      'increment output "row"
            dict(keys(rw)) = False 'flag this key as copied
        End If
    Next rw
    UniqueRows = arrout
End Function

score 1 · Accepted Answer

我认为这确实是使用 excel 的本机函数的情况，至少对于初始数组获取而言，我认为没有更简单的方法可以做到这一点。这个 sub 将输出从第 5 列开始的唯一值。我假设目标范围是空的，所以如果不是，请更改 r 和 c。

Sub testUniques()
    
    Dim arr, r As Long, c As Long, h As Long, w As Long
    Dim this As Worksheet: Set this = ActiveSheet
    arr = Application.Unique(this.Cells(1, 1).CurrentRegion)
    
    r = 1
    c = 5
    h = UBound(arr, 1) - 1
    w = UBound(arr, 2) - 1
    
    this.Range(this.Cells(r, c), this.Cells(r + h, c + w)) = arr
    
End Sub

score 1 · Accepted Answer

@RBILLC 的回答可以通过添加Exit For内部内部循环轻松改进：

Function eliminateDuplicate(poArr As Variant) As Variant
    Dim poArrNoDup()

    dupArrIndex = -1
    For i = LBound(poArr) To UBound(poArr)
        dupBool = False

        For j = LBound(poArr) To i
            If poArr(i) = poArr(j) And Not i = j Then
                dupBool = True
                Exit For
            End If
        Next j

        If dupBool = False Then
            dupArrIndex = dupArrIndex + 1
            ReDim Preserve poArrNoDup(dupArrIndex)
            poArrNoDup(dupArrIndex) = poArr(i)
        End If
    Next i

    eliminateDuplicate = poArrNoDup
End Function

score 0 · Accepted Answer

我知道这很旧，但这是我用来将重复值复制到另一个范围的东西，这样我就可以快速看到它们，从而为我从各种电子表格中建立的数据库建立数据完整性。要使该过程删除重复项，只需将dupRng行替换为Cell.Delete Shift:=xlToLeft或类似的东西即可。

我没有亲自测试过，但它应该可以工作。

Sub PartCompare()
    Dim partRng As Range, partArr() As Variant, i As Integer
    Dim Cell As Range, lrow As Integer

    lrow = ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
    i = 0

    Set partRng = ThisWorkbook.Worksheets("Sheet1").Range(Cells(1, 1), Cells(lrow, 1))

    For Each Cell In partRng.Cells
        ReDim Preserve partArr(i)
        partArr(i) = Cell.Value
        i = i + 1
    Next

    Dim dupRng As Range, j As Integer, x As Integer, c As Integer

    Set dupRng = ThisWorkbook.Worksheets("Sheet1").Range("D1")

    x = 0
    c = 1
    For Each Cell In partRng.Cells
        For j = c To UBound(partArr)
            If partArr(j) = Cell.Value Then
                dupRng.Offset(x, 0).Value = Cell.Value
                dupRng.Offset(x, 1).Value = Cell.Address()
                x = x + 1
                Exit For
            End If
        Next j
        c = c + 1
    Next Cell
End Sub

score 0 · Accepted Answer

从数组中删除重复项（加上相关的行项）

由于 OP 想要一个接近的 VBA 解决方案RemoveDuplicates，我演示了一种使用 ►dictionary 的数组方法来获取不是唯一的项目本身 ( dict.keys)，而是第一次出现的相关行索引( dict.items)。

这些用于通过LeaveUniques受益于 ►<code>Application.Index() 函数的高级可能性的过程来保留整个行数据 - 参见Application.Index 函数的一些特性

示例调用

Sub ExampleCall()
'[0]define range and assign data to 1-based 2-dim datafield
    With Sheet1                   ' << reference to your project's sheet Code(Name)
        Dim lastRow: lastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
        Dim rng:  Set rng = .Range("C2:E" & lastRow)
    End With
    Dim data: data = rng        ' assign data to 2-dim datafield
'[1]get uniques (column 1) and remove duplicate rows
    LeaveUniques data           ' << call procedure LeaveUniques (c.f. RemoveDuplicates)
'[2]overwrite original range
    rng.Clear
    rng.Resize(UBound(data), UBound(data, 2)) = data
End Sub

程序LeaveUniques

Sub LeaveUniques(ByRef data As Variant, Optional ByVal colNum As Long = 1)
'Purpose: procedure removes duplicates of given column number in entire array
    data = Application.Index(data, uniqueRowIndices(data, colNum), nColIndices(UBound(data, 2)))
End Sub

帮助功能LeaveUniques

Function uniqueRowIndices(data, Optional ByVal colNum As Long = 1)
'Purpose: return data index numbers referring to uniques
'a) set late bound dictionary to memory
    Dim dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
'b) slice e.g. first data column (colNum = 1)
    Dim colData
    colData = Application.Index(data, 0, colNum)
'c) fill dictionary with uniques referring to first occurencies
    Dim i As Long
    For i = 1 To UBound(colData)
        If Not dict.exists(dict(colData(i, 1))) Then dict(colData(i, 1)) = i
    Next
'd) return 2-dim array of valid unique 1-based index numbers
    uniqueRowIndices = Application.Transpose(dict.items)
End Function

Function nColIndices(ByVal n As Long)
'Purpose: return "flat" array of n column indices, e.g. for n = 3 ~> Array(1, 2, 3)
    nColIndices = Application.Transpose(Evaluate("row(1:" & n & ")"))
End Function

excel - 使用 VBA 从数组中删除重复项

9 回答 9

Related

Reference