您可以先建立一个频率运行总和的表。因此,如果您有以下数据:
%freq = (
a => 15,
b => 25,
c => 30,
d => 20
);
运行总和将是;
%running_sums = (
a => 0,
b => 15,
c => 40, # 15 + 25
d => 70, # 15 + 25 + 30
);
$max_sum = 90; # 15 + 25 + 30 + 20
要选择加权频率的单个字母,您需要选择一个介于 之间的数字[0,90)
,然后您可以在 running_sum 表上进行线性搜索,以查找包含该字母的范围。例如,如果您的随机数是 20,那么合适的范围是 15-40,即字母“b”。使用线性搜索给出了总运行时间,O(m*n)
其中 m 是我们需要的字母数,n 是字母表的大小(因此 m=16,n=26)。这基本上就是 @default 语言环境所做的。
除了线性搜索,您还可以在 running_sum 表上进行二进制搜索,以获得最接近的数字向下舍入。这给出了总运行时间O(m*log(n))
。
但是,对于挑选 m 个字母,有一种比 更快的方法O(m*log(n))
,尤其是 if n < m
。首先,您m
按排序顺序生成随机数(无需排序即可完成),O(n)
然后对已排序随机数列表和运行总和列表之间的范围进行线性匹配。这给出了总运行时间O(m+n)
。整个代码在 Ideone 中运行。
use List::Util qw(shuffle);
my %freq = (...);
# list of letters in sorted order, i.e. "a", "b", "c", ..., "x", "y", "z"
# sorting is O(n*log(n)) but it can be avoided if you already have
# a list of letters you're interested in using
my @letters = sort keys %freq;
# compute the running_sums table in O(n)
my $sum = 0;
my %running_sum;
for(@letters) {
$running_sum{$_} = $sum;
$sum += $freq{$_};
}
# generate a string with letters in $freq frequency in O(m)
my $curmax = 1;
my $curletter = $#letters;
my $i = 16; # the number of letters we want to generate
my @result;
while ($i > 0) {
# $curmax generates a uniformly distributed decreasing random number in [0,1)
# see http://repository.cmu.edu/cgi/viewcontent.cgi?article=3483&context=compsci
$curmax = $curmax * (1-rand())**(1. / $i);
# scale the random number $curmax to [0,$sum)
my $num = int ($curmax * $sum);
# find the range that includes $num
while ($num < $running_sum{$letters[$curletter]}) {
$curletter--;
}
push(@result, $letters[$curletter]);
$i--;
}
# since $result is sorted, you may want to use shuffle it first
# Fisher-Yates shuffle is O(m)
print "", join('', shuffle(@result));