1

基本上(开玩笑地说)我有一个格式如下的 csv 文件:

"ID","Name","Phone Number"
"00001","Ricky Stallman","07771111111"
"00003","Harrison Ford","07701010101"
"00003","Harrison Ford",""
"00008","Bob Geldof","07712121212"

'Harrison Ford' 条目再次出现在我的 csv 中,旁边没有数字(这只是数据令人讨厌地呈现给我的方式)。我需要 csv 像这样读取(即,将上面行中的数字复制到下面的字段中):

"ID","Name","Phone Number"
"00001","Ricky Stallman","07771111111"
"00003","Harrison Ford","07701010101"
"00003","Harrison Ford","07701010101"
"00008","Bob Geldof","07712121212"

如果有人有建议,最好在 Bash 中?

4

2 回答 2

3

尝试这个:

awk -F',' '$3!~/""/{nbr=$3} {print $1","$2","nbr}' file

如果第三列是“”,则使用最后一个有效值。

于 2013-08-26T09:57:12.633 回答
2

可以使用 gawk 解决方案:

#!/usr/bin/gawk -f

match($0, /"([^\"]*)".*,"([^"]*)","([^"]*)"/, t) {
    key = t[1] "|" t[2]  ## Or just key = t[2] to be less strict.
    if (!(t[3] == "" && key in a)) {
        a[key] = t[3]
    }
    printf "\"%s\",\"%s\",\"\"%s\"\n", t[1], t[2], a[key]
}

浓缩:

gawk 'match($0, /"([^\"]*)".*,"([^"]*)","([^"]*)"/, t) { key = t[1] "|" t[2]; if (!(t[3] == "" && key in a)) a[key] = t[3]; printf "\"%s\",\"%s\",\"%s\"\n", t[1], t[2], a[key] }' file

输出:

"ID","Name","Phone Number"
"00001","Ricky Stallman","07771111111"
"00003","Harrison Ford","07701010101"
"00003","Harrison Ford","07701010101"
"00008","Bob Geldof","07712121212"
于 2013-08-26T10:17:30.690 回答