bash - 列出文本文件中的所有单词以及出现次数？

Question

假设我有text.txt如下文件：

she likes cats, and he likes cats too.

我希望我的结果看起来像：

she 1
likes 2
cats 2
and 1
he 1
too 1

如果放入space , .它会使脚本更容易，那很好。

是否有一个简单的 shell 管道可以实现这一点？

score 20 · Accepted Answer

这是我心中亲近的单线：

cat text.txt | sed 's|[,.]||g' | tr ' ' '\n' | sort | uniq -c

sed 去除标点符号（调整正则表达式以适应口味），tr 将结果每行一个单词。

score 0 · Accepted Answer

使用 GNU awk，您只需将记录分隔符 (RS) 指定为任何非字母字符序列：

$ gawk -v RS='[^[:alpha:]]+' '{sum[$0]++} END{for (word in sum) print word,sum[word]}' file
she 1
likes 2
and 1
too 1
he 1
cats 2

但这并不能解决您一般如何识别“单词”的问题。

2 回答 2