linux - 用于查找文件中每个字母的频率的 Bash 脚本

Question

我试图找出输入文件中英文字母表中每个字母的出现频率。如何在 bash 脚本中执行此操作？

score 31 · Accepted Answer

我的解决方案使用grep,sort和uniq.

grep -o . file | sort | uniq -c

忽略大小写：

grep -o . file | sort -f | uniq -ic

score 25 · Accepted Answer

只需一个 awk 命令

awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file

如果您想要不区分大小写，请添加tolower()

awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file

如果你只想要字符，

awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file

如果您只想要数字，请更改/[a-zA-Z]/为/[0-9]/

如果您不想显示 unicode，请执行 export LC_ALL=C

score 8 · Accepted Answer

和的解决sed方案：sortuniq

sed 's/\(.\)/\1\n/g' file | sort | uniq -c

这会计算所有字符，而不仅仅是字母。你可以过滤掉：

sed 's/\(.\)/\1\n/g' file | grep '[A-Za-z]' | sort | uniq -c

如果您想将大写和小写视为相同，只需添加翻译：

sed 's/\(.\)/\1\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c

score 4 · Accepted Answer

这是一个建议：

while read -n 1 c
do
    echo "$c"
done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr

score 0 · Accepted Answer

与上述 mouviciel 的回答类似，但对于 BSD 系统上使用的 Bourne 和 Korn shell 更为通用，当您没有 GNU sed（它支持 \n 替换）时，您可以使用反斜杠转义换行符：

sed -e's/./&\
/g' file | sort | uniq -c | sort -nr

或者为了避免在屏幕上出现视觉分割，请通过键入 CTRL+V CTRL+J 插入文字换行符

sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr

5 回答 5