我目前正在阅读 Advanced Bash-Scripting Guide 并发现以下内容:
# Generate binary choice, that is, "true" or "false" value.
BINARY=2
T=1
number=$RANDOM
let "number %= $BINARY"
# Note that let "number >>= 14" gives a better random distribution
#+ (right shifts out everything except last binary digit).
if [ "$number" -eq $T ]
then
echo "TRUE"
else
echo "FALSE"
fi
echo
为什么建议取第 15 位而不是第 1 位?使用二元决策的几次运行表明两者之间没有显着差异。
// 更新 因为有人问我是如何计算分布的,所以我们开始吧。我生成了几个 $RANDOM 数字,取了每个数字的第 15 位和第 1 位,并创建了两个二进制序列。之后,我遍历了这些序列,检查了 1 和 0 的链(运行),计算了最大长度序列将生成多少次运行(供参考)并将所有内容打印到一个混乱的表中。这是所有荣耀的代码(对不起,脏代码......):
#! /bin/bash
COUNT=10000
RUN=1
# generate 2 sequences based on the same $RANDOM numbers
# seq1 = modulo 2, seq2 = bitshift 14
while [ $RUN -le $COUNT ]
do
number=$RANDOM
let 'var1=number%2'
var2=$number
let 'var2 >>= 14'
seq1="${seq1}${var1}"
seq2="${seq2}${var2}"
(( RUN+=1 ))
done
# loop through sequences and check for chains of 1 and 0 (runs)
length=${#seq1}
prevSym=${seq1:0:1}
currRun="${prevSym}"
for (( i=1; i<length; i++ )); do
currSym=${seq1:$i:1}
if (( currSym==prevSym )); then
currRun="${currRun}${currSym}"
(( i!=length-1 )) && continue
(( runStat1[${#currRun}]++ )) #case: ends with run length > 1
break
fi
(( runStat1[${#currRun}]++ ))
(( prevSym=currSym ))
(( i==length-1 )) && (( runStat1[1]++ )) #case: ends with run length = 1
currRun="${currSym}"
done
length=${#seq2}
prevSym=${seq2:0:1}
currRun="${prevSym}"
for (( i=1; i<length; i++ )); do
currSym=${seq2:$i:1}
if (( currSym==prevSym )); then
currRun="${currRun}${currSym}"
(( i!=length-1 )) && continue
(( runStat2[${#currRun}]++ )) #case: ends with run length > 1
break
fi
(( runStat2[${#currRun}]++ ))
(( prevSym=currSym ))
(( i==length-1 )) && (( runStat2[1]++ )) #case: ends with run length = 1
currRun="${currSym}"
done
# print results and expected frequency
# number of expected runs with runlength k:
# 1/2**k if k<n, 1/2**(k-1) if k=n
# $RANDOM generates random numbers in the range 0 to 32768 thus n=15
n=15
echo -e "Length L of run | # of runs with %2 | # of runs with >>14 | # of runs with MLS (calculated)\n "
echo -e "L\t|%2\t|>>14\t|MLS"
echo -e "-----------------------------------\n"
sorted="${!runStat1[*]} ${!runStat2[*]}"
sorted=$(echo $sorted | tr ' ' '\n' | sort -n | uniq)
for a in $sorted; do
k=${a}
(( ${a}==${n} )) && (( k=a-1 ))
prob=$(awk -v k=${a} -v c=${COUNT} 'BEGIN { print (((1/2)**k)*c)/k}')
echo -e "${a} \t| ${runStat1[$a]} \t| ${runStat2[$a]} \t| ${prob} "
done
运行它会打印出以下内容:
Length L of run | # of runs with %2 | # of runs with >>14 | # of runs with MLS (calculated)
L |%2 |>>14 |MLS
-----------------------------------
1 | 2495 | 2450 | 5000
2 | 1219 | 1212 | 1250
3 | 638 | 621 | 416.667
4 | 300 | 329 | 156.25
5 | 162 | 166 | 62.5
6 | 75 | 81 | 26.0417
7 | 46 | 34 | 11.1607
8 | 23 | 26 | 4.88281
9 | 13 | 7 | 2.17014
10 | 2 | 6 | 0.976562
11 | 1 | 1 | 0.443892
13 | 3 | | 0.0939002
15 | | 2 | 0.0203451
21 | | 1 | 0.000227065
这使我得出的结论是,不出所料,并且在所有 bash 参考资料中也提到,$RANDOM 是一个可怕的随机性来源......但“number >>= 14”也没有比“number %=”更好的随机分布2" 用于二元选择。
......或者我在这个巨大的愚蠢计算中犯了一个巨大的错误。你告诉我。