bash - 如何仅将文件中的重复项打印一次？

Question

我有一个输入文件，其中包含：

123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple

我希望输出为：重复以下数字：123 543

有没有办法使用 awk 获得这个输出？我正在用 solaris bash 编写脚本

score 3 · Accepted Answer

3

sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d

于 2013-08-17T16:48:56.740 回答

score 1 · Accepted Answer

awk -vFS=',' \
     '{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; }   \
      END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile

也有 sort/uniq/cut 的解决方案（见上文）。

score 1 · Accepted Answer

如果你可以不使用 awk，你可以使用它来获取重复的数字：

cut -d, -f 1 my_file.txt  | sort | uniq -d

印刷

123
543

编辑：（回应您的评论）

您可以缓冲输出并决定是否要继续。例如：

out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
    echo "The following numbers are repeated: $out"
    exit
fi

# continue...

score 1 · Accepted Answer

此脚本将仅打印重复多次的第一列的编号：

awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file

或者更短的形式：

awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file

如果您想在找到 dup 时退出脚本，则可以退出非零退出代码。例如：

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file

在您的主脚本中，您可以执行以下操作：

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1

或者以更易读的格式：

awk -F, '
    a[$1]++==1{
        dup=1
    }
    END{
        if (dup) {
            printf "The following numbers are repeated: ";
            for (i in a) 
                if (a[i]>1) 
                    printf "%s ",i; 
            print "";
            exit(-1)
        }
    }
' file || exit -1

bash - 如何仅将文件中的重复项打印一次？

4 回答 4

Related

Reference