2

我正在尝试编写一个脚本,该脚本查看一行的一部分,执行 asort -u或某事来查找唯一的事件,然后显示输出,按行的原始顺序排序。换句话说,只有第一次出现的那部分行会出现。

我设法做到了cut,但我的输出只显示数据的剪切部分。我该怎么做才能获得整条线?

这是我到目前为止所得到的:

cut -d, -f6 infile.txt | cut -c4-11 | grep -n . | sort -t: -k2,2 -u | sort -t: -k1n,1 | cut -d: -f2-

我知道数据没有额外的:,会破坏这个脚本的地方。但这只会输出唯一的数据。我怎样才能得到整条线?我宁愿远离 perl,但 awk 还可以(虽然我不太了解)。

样本:

如果输入文件是这样的(注意,ABCDEFGH 不是真实的,我只是把它放在那里以说明我的意思):

A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
C....,....,...........,.....,....,...20130718......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
F....,....,...........,.....,....,...20130714......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
H....,....,...........,.....,....,...20130718......,.........,...........,......

我的程序输出:

20130718
20130714
20130719
20130713
20130630

我想看看:

A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
4

1 回答 1

5

是的,awk是您最好的选择。这是一个神秘的例子:

awk -F, '!seen[substr($6,4,8)]++' infile.txt

解释:

options:
  -F,              set the field separator to ,

condition:
  substr($6,4,8)   up to 8 characters starting at the fourth character
                   of the sixth field
  seen[...]++      seen is an associative array (dictionary). Increment the
                   value associated with ..., and return the old value
  !seen[...]++     if there was no old value, perform the action


action:
  There is no action, only a condition, so the default action is
  performed if the test succeeds. The default action is to print
  the line. So the  line will be printed if the relevant characters of
  the sixth field haven't yet been seen.

测试:

$ awk -F, '!seen[substr($6,4,8)]++' <<EOF
> A....,....,...........,.....,....,...20130718......,.........,...........,......
> B....,....,...........,.....,....,...20130714......,.........,...........,......
> C....,....,...........,.....,....,...20130718......,.........,...........,......
> D....,....,...........,.....,....,...20130719......,.........,...........,......
> E....,....,...........,.....,....,...20130713......,.........,...........,......
> F....,....,...........,.....,....,...20130714......,.........,...........,......
> G....,....,...........,.....,....,...20130630......,.........,...........,......
> H....,....,...........,.....,....,...20130718......,.........,...........,......
> EOF
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
$
于 2013-07-18T21:04:05.507 回答