0

我正在学习 Unix,我想对这个表进行排序:

Name:Alice,ID:2368,Hometown:columbus,bithday:03/11/1988
Name:Ted,ID:2368,Hometown:Portland,bithday:06-11-1992
Name:Mark,ID:2218,Hometown:Palo Alto,bithday:04-23-1984
Name:Xiao,ID:2571,hometown:Carson,bithday:07/06/1975
Name:Rain,ID:0264,hometown:little stone,bithday:11-09-1982
Name:Susan,ID:1261,Hometown:Menlo park,bithday:12-13-1989
Name:Zack,ID:1594,Hometown:columbus,bithday:02-04-1984

并将其格式化如下:

Rain,0264,little stone,11-09-1982
Susan,1261,Menlo park,12-13-1989
Zack,1594,columbus,02-04-1984
Mark,2218,Palo Alto,04-23-1984
Alice,2368,columbus,03-11-1988
Ted,2368,Portland,06-11-1992
Xiao,2571,Carson,07-06-1975

我想过滤掉键值对(键:值)中的键,然后使用 awk 和排序按 ID 排序。

我会使用什么命令来做到这一点?

4

9 回答 9

3

Its as simple as this:

awk -F: '{gsub(/,[^:]*:/,",");print $2}' You_file| sort -t, -k 2,2n

tested below:

> cat temp
Name:Alice,ID:2368,Hometown:columbus,bithday:03/11/1988
Name:Ted,ID:2368,Hometown:Portland,bithday:06-11-1992
Name:Mark,ID:2218,Hometown:Palo Alto,bithday:04-23-1984
Name:Xiao,ID:2571,hometown:Carson,bithday:07/06/1975
Name:Rain,ID:0264,hometown:little stone,bithday:11-09-1982
Name:Susan,ID:1261,Hometown:Menlo park,bithday:12-13-1989
Name:Zack,ID:1594,Hometown:columbus,bithday:02-04-1984
>

Now the execution:

> awk -F: '{gsub(/,[^:]*:/,",");print $2}' temp | sort -t, -k 2,2n
Rain,0264,little stone,11-09-1982
Susan,1261,Menlo park,12-13-1989
Zack,1594,columbus,02-04-1984
Mark,2218,Palo Alto,04-23-1984
Alice,2368,columbus,03/11/1988
Ted,2368,Portland,06-11-1992
Xiao,2571,Carson,07/06/1975
于 2012-10-29T07:11:35.693 回答
2

我花了一段时间才弄清楚,直到我终于看到您的输入数据中的“生日”拼写错误。

您可以全力以赴并将数据解析为实际数据:

awk -F, '
  BEGIN {
    fmt="%s,%s,%s,%s\n";
  }

  {
    for (i=1;i<=NF;i++) {           # walk through the fields...
      split($i,a,":");              # split each one at the colon, save to array
      v[tolower(a[1])]=a[2];        # need tolower() as "Hometown" is inconsistent
    }
    split(v["bithday"],b,/[-\/]/);  # regex here handles your inconsistent divider
    v["bithday"]=sprintf("%s-%s-%s",b[3],b[2],b[1]);
    printf(fmt,v["name"],v["id"],v["hometown"],v["bithday"]);
  }
' input.txt | sort -t, -k2

这会遍历每一行,用逗号分解字段,将 key=value 组合存储在一个数组中,调整你的“bithday”格式并打印。(请注意,我为您选择了更合理的日期格式。)

但是做一些简单的模式匹配可能更容易:

sed -Ene 's/^[[:alpha:]]+://;s/,[[:alpha:]]+:/,/g;s/([0-9]{2}).([0-9]{2}).([0-9]{4})$/\3-\2-\1/;p' input.txt \
| sort -t, -k2

这会产生相同的结果,但代码更少。如果您需要对输入数据做更多有趣的事情,那么当然可以使用 awk。

哦,我sed来自 FreeBSD,所以它使用-E选项来获取 ERE。如果您使用的是 Linux 或其他 GNU sed 提供商,您可能可以-E-r.

于 2012-10-28T21:22:20.387 回答
2

使用 grep 查找值,粘贴以重新组合行,当然还有排序:

grep -oP '(?<=:).*?(,|$)' filename | paste -d "" - - - - | sort -n -t, -k2

假设值中没有逗号。

于 2012-10-29T12:16:05.113 回答
1

Must there be awk? If not:

  1. use Vim to remove key strings with :%s/[a-z]*://gi
  2. Use sort to sort: sort -t , -k 2 file

If AWK is a must, then I'd think of this, to sort WITHOUT losing keys - but the answer @Aif gave is nice as well.

EDIT: improved thanks to @Aif's regex and due to @Ghoti's comment. Now Vim command for substitution uses regex and key case and text 'matters not' as master Yoda would say.

于 2012-10-28T20:03:13.033 回答
1

只需将 FS 和 OFS 设置为您的想法,打印您想要的字段,然后排序:

$ awk -F'[:,]' -v OFS=, '{print $2,$4,$6,$8}' file | sort -t, -k2n
Rain,0264,little stone,11-09-1982
Susan,1261,Menlo park,12-13-1989
Zack,1594,columbus,02-04-1984
Mark,2218,Palo Alto,04-23-1984
Alice,2368,columbus,03/11/1988
Ted,2368,Portland,06-11-1992
Xiao,2571,Carson,07/06/1975
于 2012-10-29T14:18:17.707 回答
1

这是一种使用方法GNU awk

awk 'BEGIN { FS="[,:]"; OFS="," } { for (i=2; i<=NF; i+=2) printf (i!=NF) ? $i OFS : $i ORS | "sort -t, -nk2" }' file.txt

结果:

Rain,0264,little stone,11-09-1982
Susan,1261,Menlo park,12-13-1989
Zack,1594,columbus,02-04-1984
Mark,2218,Palo Alto,04-23-1984
Alice,2368,columbus,03/11/1988
Ted,2368,Portland,06-11-1992
Xiao,2571,Carson,07/06/1975
于 2012-10-29T00:32:07.103 回答
1

我发现最简单的方法是使用重新格式化输出,并附加一个将使用awk的新列,然后再次使用来隐藏该列。sortawk

$ cat test.dat
Name:Alice,ID:2368,Hometown:columbus,bithday:03/11/1988
Name:Ted,ID:2368,Hometown:Portland,bithday:06-11-1992
Name:Mark,ID:2218,Hometown:Palo Alto,bithday:04-23-1984
Name:Xiao,ID:2571,hometown:Carson,bithday:07/06/1975
Name:Rain,ID:0264,hometown:little stone,bithday:11-09-1982
Name:Susan,ID:1261,Hometown:Menlo park,bithday:12-13-1989
Name:Zack,ID:1594,Hometown:columbus,bithday:02-04-1984

$ cat test.dat| awk -F, '{ gsub(/[a-zA-Z]+:/, ""); print $2,$0; }' | sort | awk '{ $1=""; print; }'
 Rain,0264,little stone,11-09-1982
 Susan,1261,Menlo park,12-13-1989
 Zack,1594,columbus,02-04-1984
 Mark,2218,Palo Alto,04-23-1984
 Alice,2368,columbus,03/11/1988
 Ted,2368,Portland,06-11-1992
 Xiao,2571,Carson,07/06/1975

-F指定分隔符(此处为,)。然后我们要删除列名(即后面的任何字母:),最后显示ID列,以及整行重写。然后我们使用sort,默认情况下假定排序键是第一列,并且awk再次只显示每行的第二部分。

编辑:鉴于城市中的空间,awk 存在输出问题。为了简单起见,我只是重新分配了第一个变量(这是您要隐藏的列)并打印整行。

于 2012-10-28T19:11:02.397 回答
0

在您根据需要对其进行格式化后(我知道您有),您可以通过将数据传输到sort -t, -k2.

如果您实际上还没有,我认为最简单的方法之一是sed 's/[[:alnum:]]*://g'.

所以整个命令将是

sed 's/[[:alnum:]]*://g' table.csv | sort -t, -k2
于 2012-10-28T21:39:07.740 回答
-1

猫温度.txt | awk -F",|:" '{打印 $2","$4","$6}' | 排序 -t, -k2n

于 2013-08-03T04:12:04.183 回答