0

msg_type,mmsi,timestamp,imo,name,ship_and_cargo_type,length,width,draft,eta_date,destination

24,510041000,2016-07-05 12:49:16 UTC,,,30,29,6,,,

5,371952000,2016-07-16 07:30:40 UTC,9687112,SPRING

传奇,90,190,32,11.7,2016-08-08 00:00:00 UTC,"ONAHAMA,JAPAN"

5,412331087,2016-07-24 11:14:02 UTC,0,陆皇园

117,30,0,0,0,,"" 5,775994600,2016-07-02 07:43:55 UTC,9318814,伊丽莎白

A MCCALL,60,44,9,3.5,2016-11-16 06:05:00 UTC,GUIRIA

我试图在该表的倒数第二个字段中插入一个空列。例如,标题看起来像这样:

msg_type,mmsi,timestamp,imo,name,ship_and_cargo_type,length,width,draft,eta_date,,destination

我正在使用 AWK 命令,但它不能正确处理引号,例如“ONAHAMA,JAPAN”。

有没有更好的方法,我该如何克服?这是我的尝试。

谢谢

awk -F, -v OFS="," '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,","$11}' old_table > new_table
4

2 回答 2

1

傻瓜解决方案:

awk -v FPAT='"[^"]+"|[^,]+' '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,","$11}' OFS=',' old_table > new_table
  • v FPAT='"[^"]+"|[^,]+'- 模式定义字段值
于 2017-07-13T15:57:53.883 回答
1

这种特殊情况可以解决sed,但请查看具有模块perl的等pythoncsv

$ sed -E 's/"[^"]+"$|[^,]*$/,&/' ip.txt
msg_type,mmsi,timestamp,imo,name,ship_and_cargo_type,length,width,draught,eta_date,,destination
24,510041000,2016-07-05 12:49:16 UTC,,,30,29,6,,,,
5,371952000,2016-07-16 07:30:40 UTC,9687112,,SPRING
LEGEND,90,190,32,11.7,2016-08-08 00:00:00 UTC,,"ONAHAMA,JAPAN"
5,412331087,2016-07-24 11:14:02 UTC,0,,LU HUANG YUAN YU
117,30,0,0,0,,"" 5,775994600,2016-07-02 07:43:55 UTC,9318814,,ELIZABETH
A MCCALL,60,44,9,3.5,2016-11-16 06:05:00 UTC,,GUIRIA
  • -E使用扩展正则表达式,一些实现-r使用
  • "[^"]+"$|[^,]*$最后一个字段在双引号内,否则为非,字符
  • ,&替换为,和匹配的文本
于 2017-07-13T14:52:05.877 回答