bash - 在字段中查找重复项并在 unix bash 中打印它们

Question

我有一个包含的文件

apple
apple
banana
orange
apple
orange

我想要一个脚本来找到重复的 apple 和 orange 并告诉用户以下内容：apple 和 orange 被重复。我试过了

nawk '!x[$1]++' FS="," filename

找到重复的项目，那么我如何在 unix bash 中将它们打印出来？

score 11 · Accepted Answer

为了打印重复的行，您可以说：

$ sort filename | uniq -d
apple
orange

如果您还想打印计数，请提供以下-c选项uniq：

$ sort filename | uniq -dc
      3 apple
      2 orange

score 4 · Accepted Answer

+1 为devnul 的回答。但是，如果文件包含空格而不是换行符作为分隔符。那么以下将起作用。

tr [:blank:] "\n" < filename | sort | uniq -d

score 1 · Accepted Answer

更新：

问题发生了重大变化。以前，在回答这个问题时，输入文件应如下所示：

apple apple banana orange apple orange
banana orange apple
...

但是，该解决方案无论如何都会起作用，但对于这个特殊用例来说可能有点太复杂了。

以下 awk 脚本将完成这项工作：

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

输出：

apple 3
orange 2

像这样的形式更容易理解：

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

然后使文件可执行并执行它并将输入文件名传递给它：

chmod +x script.awk
./script.awk your.file

bash - 在字段中查找重复项并在 unix bash 中打印它们

3 回答 3

Related

Reference