通过sort
首先对文件进行排序,然后您可以应用uniq
.
似乎对文件进行了很好的排序:
$ cat test.csv
overflow@domain2.com,2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
overflow@domain2.com,2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack3@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack4@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
$ sort test.csv
overflow@domain2.com,2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
overflow@domain2.com,2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack3@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack4@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
$ sort test.csv | uniq
overflow@domain2.com,2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0
overflow@domain2.com,2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack3@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack4@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
你也可以做一些 AWK 魔术:
$ awk -F, '{ lines[$1] = $0 } END { for (l in lines) print lines[l] }' test.csv
stack2@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack4@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
stack3@domain.com,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1
overflow@domain2.com,2009-11-27 00:58:29.646465785,2x3.net,256.255.255.0