bash - Unix uniq 命令到 CSV 文件

Question

我有一个包含单字和多字英文短语的文本文件 (list.txt)。我的目标是对每个单词进行字数统计并将结果写入 CSV 文件。

我已经找到了写入每个单词的唯一实例数量的命令，从最大到最小排序。该命令是：

$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | less > output.txt

问题是新文件 (output.txt) 的格式化方式。有 3 个前导空格，后面是出现次数，后面是空格，然后是单词。然后进入下一行。例子：

   9784 the
   6368 and
   4211 for
   2929 to

为了以更理想的格式（例如 CSV）获得结果，我需要做什么？例如，我希望它是：

9784,the
6368,and
4211,for
2929,to

更好的是：

the,9784
and,6368
for,4211
to,2929

有没有办法使用 Unix 命令执行此操作，或者我需要在文本编辑器或 Excel 中进行一些后处理？

score 5 · Accepted Answer

使用awk如下：

 > cat input 
   9784 the
   6368 and
   4211 for
   2929 to
 > cat input | awk '{ print $2 "," $1}'
the,9784
and,6368
for,4211
to,2929

您的完整管道将是：

$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2 "," $1}' > output.txt

1 回答 1