performance - 在 Tcl 中用唯一值填充列表更快的方法是什么？

Question

我想创建一个唯一值列表。这些值取自不同的来源和。有两种方法可以填充我的最终列表。

将所有值放入，然后执行lrmdups：

set finalList [list ]
foreach selcetion  $selectionList {
    regexp {(\d+):(\d+)} $selection -> start end
    for {set i $start} {$i <= $end} {incr i} {
        lappend finalList $i
    }
}
set finalList [lrmdups $finalList]

或检查列表中是否存在值，并且仅当它不添加它时：

set finalList [list ]
foreach selcetion  $selectionList {
    regexp {(\d+):(\d+)} $selection -> start end
    for {set i $start} {$i <= $end} {incr i} {
        if {[lsearch $finalList $i] == -1} {
            lappend finalList $i
        }
    }
}

这两种方法哪个更快？

score 5 · Accepted Answer

使用time命令测试 Tcl 代码的性能。确保将代码放在一个过程中以获得对其进行字节编译的好处，然后使用 time 命令多次运行测试并获得每次迭代的平均时间。例如，这里有一个例子说明为什么 expr 表达式应该总是被大括号括起来。

% proc a {} {expr 1 + 2 + 3}
% proc b {} {expr {1 + 2 + 3}}
% time a 1000
4.491 microseconds per iteration
% time b 1000
0.563 microseconds per iteration

为了处理特定任务 - 我会将每个新值添加到一个数组中，让它吃掉重复项，然后在最后将它变成一个列表。

proc getUniques {wantedSize} {
    array set uniques {}
    while {[array size uniques] != $wantedSize} {
         set uniques([getNewValue]) {}
    }
    return [array names uniques]
}

score 4 · Accepted Answer

我还使用该time命令进行基准测试。这是我的代码，我添加了另外两种方法，一种用于使用array，另一种用于struct::set消除重复。

#!/usr/bin/env tclsh
#http://stackoverflow.com/questions/18337534/what-way-is-faster-to-populate-a-list-with-unique-values-in-tcl

package require Tclx
package require struct::set

proc removeDupMethod {selectionList} {
    set finalList [list ]
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            lappend finalList $i
        }
    }
    set finalList [lrmdups $finalList]
    return $finalList
}

proc searchBeforInsertMethod {selectionList} {
    set finalList [list ]
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            if {[lsearch $finalList $i] == -1} {
                lappend finalList $i
            }
        }
    }
}

proc useArrayMethod {selectionList} {
    array set tally {}
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            incr tally($i)
        }
    }
    set finalList [array names tally]
    return $finalList
}

proc useStructSetMethod {selectionList} {
    set finalList {}
    foreach selection $selectionList {
        regexp {(\d+):(\d+)} $selection -> start end
        for {set i $start} {$i <= $end} {incr i} {
            struct::set include finalList $i
        }
    }
    return $finalList
}

# Performs the benchmark on a method
proc bench {methodName} {
    set selectionList {1:10 5:20 10:30 4:30}
    set timeInfo [time {$methodName $selectionList} 1000]
    puts "$methodName - $timeInfo"
}

# main
bench removeDupMethod
bench searchBeforInsertMethod
bench useArrayMethod
bench useStructSetMethod

结果：

removeDupMethod - 281.961364 microseconds per iteration
searchBeforInsertMethod - 93.984991 microseconds per iteration
useArrayMethod - 122.354889 microseconds per iteration
useStructSetMethod - 576.293311 microseconds per iteration

讨论

您的第二种方法 ,searchBeforInsertMethod是最快的。
useArrayMethod，它使用数组来确保唯一性，排在第二位。这就是说TCL的内置列表命令是非常优化的。
令我惊讶的是，这useStructSetMethod是最慢的。我认为应该优化库命令，但我错了。

更新

我接受了 Siyb 的提示并替换：

regexp {(\d+):(\d+)} $selection -> start end

和：

set range [split $selection :]
set start [lindex $selection 0]
set end [lindex $selection 1]

并看到速度显着提高：

removeDupMethod - 9.337442 microseconds per iteration
searchBeforInsertMethod - 5.528975999999999 microseconds per iteration
useArrayMethod - 6.8120519999999996 microseconds per iteration
useStructSetMethod - 5.774831 microseconds per iteration
useNative - 6.105141 microseconds per iteration

笔记

最快的还是searchBeforInsertMethod，速度提升近17倍。
useStructSetMethod现在上升到第二位

更新 2

根据 potrzebie 的要求，我在开头添加了 5000:6000 并且数字变化不大：

removeDupMethod - 10.826106 microseconds per iteration
searchBeforInsertMethod - 6.296769 microseconds per iteration
useArrayMethod - 7.752042 microseconds per iteration
useStructSetMethod - 6.910305999999999 microseconds per iteration
useNative - 7.274724 microseconds per iteration

score 2 · Accepted Answer

I have tried using lsort -unique $list instead of lrmdups. On my box, this is the fastest method:

proc useNative {selectionList} {
        foreach selection $selectionList {
            regexp {(\d+):(\d+)} $selection -> start end
            for {set i $start} {$i <= $end} {incr i} {
                lappend finalList $i
            }
        }
        set finalList [lsort -unique $finalList]
        return $finalList
}

removeDupMethod - 171.573 microseconds per iteration
searchBeforInsertMethod - 58.264 microseconds per iteration
useArrayMethod - 96.109 microseconds per iteration
useStructSetMethod - 386.889 microseconds per iteration
useNative - 41.556 microseconds per iteration

EDIT: using split instead of the regular expression speeds up things as well (if the regex is actually part of your problem):

useNative - 20.938 microseconds per iteration

EDIT2: try adding -integer as a lsort parameter, should speed up things a little as well, if your are planning on sorting integers that is

performance - 在 Tcl 中用唯一值填充列表更快的方法是什么？

3 回答 3

讨论

更新

更新 2

Related

Reference