shell - 如何使用 shell 从文件中删除重复条目

Question

我有一个格式为：

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888    
0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|money

我正在尝试删除出现在同一行的重复项。

所以，如果一条线有

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi

我会喜欢的

0000000540|Q1.1|margi

如果线路有

0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888

我会喜欢它像

0099940598|Q1.2|8888|Q1.3|5454

我想在一个 shell 脚本上执行此操作，该脚本接受一个输入文件并输出没有重复的文件。

提前感谢任何可以提供帮助的人

score 1 · Accepted Answer

这应该可以，但对于大文件可能效率不高。

awk '
    {
        delete p;
        n = split($0, a, "|");

        printf("%s", a[1]);

        for (i = 2; i <= n ; i++)
        {
                if (!(a[i] in p))
                {
                    printf("|%s", a[i]);
                    p[a[i]] = "";
                }
        }

        printf "\n";
    }
' YourFileName

shell - 如何使用 shell 从文件中删除重复条目

1 回答 1

Related

Reference