excel - 宏以在excel工作表的各个列中进行所有可能的数据组合？

Question

在另一篇文章中，用户Excellll提供了上述问题的宏地址。

我有一个工作表，其数据如下：

A                      B           C
abc,def,ghi,jkl      1,2,3     a1,e3,h5,j8

以下解决方案将其变为

abc  1  a1
abc  2  a1
abc  3  a1
abc  1  e3
abc  2  e3
abc  3  h5

但是，我想知道随着数据列数从 3 列数据增长到 10 列数据，如何修改宏。

我尝试根据我看到的代码中的模式多次修改宏，但我一直收到错误消息。

score 0 · Accepted Answer

这是一个通用的解决方案，它使用递归来处理任意数量的列（大于 1）

Sub Combinations()
    Dim aSrc As Variant

    ' Get Data into an array
    '  This section is an example to get the source data into an array
    '  Replace this section if your data is sourced differently.
    '  The required format of aSrc is Array(1 To NumberOfColumnsOfData)
    '  where each element aSrc(n) is Array(1 To NumberOfRowsInColumnN, 1 To 1) of Variant
    Dim rSrc As Range, colR As Range
    Dim sh As Worksheet
    Dim a As Variant
    Dim i As Long
    Set sh = ActiveSheet ' <-- Adjust to suit
    Set rSrc = sh.[A:D]  ' <-- Adjust to suit
    ReDim aSrc(1 To rSrc.Columns.Count)
    With sh
        For i = 1 To rSrc.Columns.Count
            Set colR = rSrc.Columns(i)
            aSrc(i) = .Range(colR.Cells(1, 1), colR.Cells(.Rows.Count, 1).End(xlUp))
        Next
    End With

    ' Generate output
    '  This populates aDst(1 To lSize, 1 To NumberOfSourceColumns)
    '  where lSize is total number of combinations
    Dim aDst As Variant
    Dim lSize As Long
    Dim n As Long
    Dim aBase() As String
    lSize = 1
    For i = 1 To UBound(aSrc)
        lSize = lSize * UBound(aSrc(i), 1)
    Next
    ReDim aDst(1 To lSize, 1 To UBound(aSrc))
    ReDim aBase(0 To UBound(aSrc) - 1)
    n = 1
    aBase = Split(String(UBound(aSrc) - 1, ","), ",")
    aBase(0) = aSrc(1)(1, 1)
    Generate aSrc, aDst, aBase, 1, n

    ' Place output into sheet
    '   Starting at cell rDst
    Dim rDst As Range
    Set rDst = [E1]  ' <-- Adjust to suit
    Set rDst = rDst.Resize(UBound(aDst, 1), UBound(aDst, 2))
    rDst = aDst

End Sub

Private Sub Generate(ByRef aSrc As Variant, ByRef aDst As Variant, ByRef aBase As Variant, ByVal pCol As Long, ByRef pDst As Long)
    Dim i As Long, j As Long
    If pCol = UBound(aSrc) Then
        ' If iterating the last source column, output to aDst
        For i = 1 To UBound(aSrc(pCol), 1)
            For j = 1 To UBound(aBase)
                aDst(pDst, j) = aBase(j - 1)
            Next
            aDst(pDst, j) = aSrc(pCol)(i, 1)
            pDst = pDst + 1
        Next
    Else
        ' If NOT iterating the last source column, aBase and call Generate again
        For i = 1 To UBound(aSrc(pCol), 1)
            aBase(pCol - 1) = aSrc(pCol)(i, 1)
            Generate aSrc, aDst, aBase, pCol + 1, pDst
        Next
    End If
End Sub

score 0 · Accepted Answer

我是递归的粉丝，但前提是我相信它提供了最简单的解决方案。我不认为它适合这个问题。

在最初的问题中，UJ9 有：

Column    A                B         C
Row 1     abc,def,ghi,jkl  1,2,3     a1,e3,h5,j8

并想要：

Column    A    B   C
Row  1    abc  1   a1
Row  2    abc  2   a1
Row  3    abc  3   a1
Row  4    abc  1   e3
Row  5    abc  2   e3
Row  6    abc  3   h5
 :
Row 48    jkl  3   j8

user1657410 想要相同但有 10 列。

原始问题的解决方案使用三个（每列一个）嵌套的 for 循环。为十个嵌套的 for 循环调整这些解决方案是可能的，但实现起来并不容易。让我们考虑这些解决方案背后的原理，然后寻找不同的实施策略。

如果我们索引每列中的值，我们会得到：

Column    A                B         C
Row 1     abc,def,ghi,jkl  1,2,3     a1,e3,h5,j8
Index     0   1   2   3    0 1 2     0  1  2  3

解决方案所做的是生成索引的每个组合：000 001 002 003 010 011 012 013 020 021 021 023 100 ... 323 并使用数字从适当的字符串中选择适当的子字符串。

为了使这种方法适用于更多的列，我们需要从嵌套的 for 循环切换到每列一个条目的数组。一个数组保存列索引的最大值，另一个保存当前选定的索引。初始状态将类似于：

Column               A    B    C    D    E    F    G    H    I    J
Maximum index array  4    3    4    4    3    2    6    3    4    2
Current index array  0    0    0    0    0    0    0    0    0    0

我们现在需要一个循环来增加当前索引数组，就像速度表一样，除了每列都有自己的最大值。也就是说，我们希望在 Current 索引数组的最右边的元素上加一，除非它已经是最大值。如果它处于最大值，则将其重置为零，并且左侧的下一列将递增，除非它处于最大值。这一直持续到循环想要将最左边的索引增加到超过其最大值为止。也就是说，我们需要一个循环，它将当前索引数组设置为以下值：

Column               A    B    C    D    E    F    G    H    I    J
Maximum index array  4    3    4    4    3    2    6    3    4    2
Current index array  0    0    0    0    0    0    0    0    0    0
                     0    0    0    0    0    0    0    0    0    1
                     0    0    0    0    0    0    0    0    0    2
                     0    0    0    0    0    0    0    0    1    0
                     0    0    0    0    0    0    0    0    1    1
                     0    0    0    0    0    0    0    0    1    2
                     0    0    0    0    0    0    0    0    2    0
                     0    0    0    0    0    0    0    0    2    1
                     0    0    0    0    0    0    0    0    2    2
                     0    0    0    0    0    0    0    0    3    0
                     0    0    0    0    0    0    0    0    3    1
                     0    0    0    0    0    0    0    0    3    2
                     0    0    0    0    0    0    0    1    0    0
       :      :
                     4    3    4    4    3    2    6    3    4    2

对于当前索引数组的每个不同值，您从每一列中选择适当的子字符串并生成包含子字符串的行。

在我们继续之前，您确定要为每个子字符串组合生成一行吗？使用我为示例选择的最大索引值，您将获得 2,520,000 行。

下面的代码假设源行是第 1 行。它从第 3 行开始输出生成的行。此代码生成与上面类似的表，因此您可以正确理解代码的工作原理。此代码下方是修改它以输出子字符串的说明。代码会根据源行中的列数进行调整。该代码不会检查您的 Excel 版本是否可以支持生成的行数。

Sub Combinations()

  Dim ColCrnt As Long
  Dim ColMax As Long
  Dim IndexCrnt() As Long
  Dim IndexMax() As Long
  Dim RowCrnt As Long
  Dim SubStrings() As String
  Dim TimeStart As Single

  TimeStart = Timer

  With Worksheets("Combinations")

    ' Use row 1 as the source row.  Find last used column.
    ColMax = .Cells(1, Columns.Count).End(xlToLeft).Column

    ' Size Index arrays according to number of columns
    ' Use one based arrays so entry number matches column number
    ReDim IndexCrnt(1 To ColMax)
    ReDim IndexMax(1 To ColMax)

    ' Initialise arrays
    For ColCrnt = 1 To ColMax
      SubStrings = Split(.Cells(1, ColCrnt).Value, ",")
      ' SubStrings is a zero-based array with one entry
      ' per comma separated value.
      IndexMax(ColCrnt) = UBound(SubStrings)
      IndexCrnt(ColCrnt) = 0
    Next

    RowCrnt = 3     ' Output generated values starting at row 3

    Do While True

      ' Use IndexCrnt() here.
      ' For this version I output the index values
      For ColCrnt = 1 To ColMax
        ' This will generate an error if RowCrnt exceeds the maximum number
        ' of columns for your version of Excel.  
        .Cells(RowCrnt, ColCrnt).Value = IndexCrnt(ColCrnt)
      Next
      RowCrnt = RowCrnt + 1

      ' Increment values in IndexCrnt() from right to left
      For ColCrnt = ColMax To 1 Step -1
        If IndexCrnt(ColCrnt) < IndexMax(ColCrnt) Then
          ' This column's current index can be incremented
          IndexCrnt(ColCrnt) = IndexCrnt(ColCrnt) + 1
          Exit For
        End If
        If ColCrnt = 1 Then
          ' Leftmost column has overflowed.
          ' All combinations of index value have been generated.
          Exit Do
        End If
        IndexCrnt(ColCrnt) = 0
        ' Loop to increment next column
      Next

    Loop

  End With

  Debug.Print Format(Timer - TimeStart, "#,###.##")

End Sub

如果你很高兴你理解了上面的代码，请替换：

      ' For this version I output the index values
      For ColCrnt = 1 To ColMax
        .Cells(RowCrnt, ColCrnt).Value = IndexCrnt(ColCrnt)
      Next

经过：

      For ColCrnt = 1 To ColMax
        SubStrings = Split(.Cells(1, ColCrnt).Value, ",")
        .Cells(RowCrnt, ColCrnt).Value = SubStrings(IndexCrnt(ColCrnt))
      Next

此修改后的代码为每个组合输出适当的子字符串，但是对于大量组合它会很慢，因为它从源单元格中为每个生成的行提取所需的子字符串。例如，它在 12.66 秒内生成 27,648 行。下面的代码需要 9.15 秒，但使用了更高级的技术。

第1步，更换：

  Dim SubStrings() As String

经过：

  Dim SubStrings() As Variant

使用Dim SubStrings() As String, SubString(N) 只能包含一个字符串。使用Dim SubStrings() As Variant, SubString(N) 可以包含字符串或整数或浮点值。这在大多数情况下都不好，因为变体的处理速度比字符串或 long 慢，并且如果您将其设置为代码的错误类型的值，您将不会收到警告。但是，我将在 SubString(N) 中存储一个数组。我将使用所谓的参差不齐的数组，因为每一行都有不同的列数。

第二步，更换：

    ReDim IndexCrnt(1 To ColMax)
    ReDim IndexMax(1 To ColMax)

经过：

    ReDim IndexCrnt(1 To ColMax)
    ReDim IndexMax(1 To ColMax)
    ReDim SubStrings(1 To ColMax)

第三步，更换：

    ' Initialise arrays
    For ColCrnt = 1 To ColMax
      SubStrings = Split(.Cells(1, ColCrnt).Value, ",")
      ' SubStrings is a zero-based array with one entry
      ' per comma separated value.
      IndexMax(ColCrnt) = UBound(SubStrings)
      IndexCrnt(ColCrnt) = 0
    Next

经过：

    ' Initialise arrays
    For ColCrnt = 1 To ColMax
      SubStrings(ColCrnt) = Split(.Cells(1, ColCrnt).Value, ",")
      IndexMax(ColCrnt) = UBound(SubStrings(ColCrnt))
      IndexCrnt(ColCrnt) = 0
    Next

在第一个版本中，每次拆分单元格时，我都会覆盖数组 SubStrings。在第二个版本中，我保存了每一列的子字符串。使用 UJ9 在原始问题中使用的值，新的 SubString 看起来像：

        ---- Columns -----
Row     0    1    2    3  
  1     abc  def  ghi  jkl
  2     1    2    3
  3     a1   e3   h5   j8

第4步：替换：

      For ColCrnt = 1 To ColMax
        SubStrings = Split(.Cells(1, ColCrnt).Value, ",")
        .Cells(RowCrnt, ColCrnt).Value = SubStrings(IndexCrnt(ColCrnt))
      Next

经过：

      For ColCrnt = 1 To ColMax
        .Cells(RowCrnt, ColCrnt).Value = SubStrings(ColCrnt)(IndexCrnt(ColCrnt))
      Next

使用修改后的代码，我不会为每个生成的值拆分源单元格。我从数组中提取我需要的子字符串。

注意：如果您曾经使用过二维数组，您将编写类似MyArray(Row,Column). 参差不齐的数组是不同的；你写MyArray(Row)(Column)。

excel - 宏以在excel工作表的各个列中进行所有可能的数据组合？

2 回答 2

Related

Reference