4

我有一个由大约 50,000 行组成的数据集,每行(或单元格)的值用逗号分隔。

item 1, item 2, item 1, item 1, item3, item 2, item 4, item3

目标输出很简单

item 1, item 2, item3, item 4

我可以使用 excel、open office calc、notepad++ 或任何其他免费提供的程序(我找到了一个 javascript 解决方案,但是它是针对单个字符串的,尝试运行它 50,000 次要么不起作用,要么需要的时间比我长有,而且我不知道足够的JS来调整它)

关于如何做到这一点的任何建议?

<编辑注意某些项目将包含空格>

4

1 回答 1

4

应该让你开始。关闭屏幕更新和计算以获得更好的性能...

Sub Tester()

    Dim dict As Object
    Dim arrItems, c As Range, y As Long
    Dim val

    Set dict = CreateObject("scripting.dictionary")

    For Each c In ActiveSheet.Range("A1:A100").Cells

        arrItems = Split(c.Value, ",")
        dict.RemoveAll
        For y = LBound(arrItems) To UBound(arrItems)
            val = Trim(arrItems(y))
            If Not dict.exists(val) Then dict.Add val, 1
        Next y

        c.Offset(0, 1).Value = Join(ArraySort(dict.keys), ",")

    Next c

End Sub

对键进行排序:

Function ArraySort(MyArray As Variant)

    Dim First           As Integer
    Dim Last            As Integer
    Dim i               As Integer
    Dim j               As Integer
    Dim Temp

    First = LBound(MyArray)
    Last = UBound(MyArray)
    For i = First To Last - 1
        For j = i + 1 To Last
            If MyArray(i) > MyArray(j) Then
                Temp = MyArray(j)
                MyArray(j) = MyArray(i)
                MyArray(i) = Temp
            End If
        Next j
    Next i
    ArraySort = MyArray

End Function
于 2012-06-27T19:51:21.550 回答