3

我有一个文件,我已将其修剪为如下所示:

"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

等等

我想最终得到的是每个城市的总金额的总和。那是:

"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"

等等。

公平警告,数据集不一定是连续的——也就是说,一个城市可能在这里出现一次,往下千行一次,最后再出现3次。

我一直在尝试使用以下 awk 脚本:

awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv

我得到的结果如下所示:

"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0

等等。第二列全为零,没有引号。

我显然错过了一些东西,但我不知道该看什么或在哪里看。我错过了什么?

谢谢。

4

4 回答 4

3

你失败的原因是因为双引号。

做这样的事情:

sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\""  i "\""  ","  "\"" array[i] "\"" }}' 

"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
于 2013-10-03T18:27:46.490 回答
2

这个 awk 单行器将提供您想要的格式:

awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file

用你的数据测试:

kent$  cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

kent$  awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"
于 2013-10-03T18:31:56.897 回答
1

"导致您的输入出现问题。首先使用删除它们并使用内部sed打印回来printfawk

尝试以下操作:

sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv

混乱的输入

"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"

输出

"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
于 2013-10-03T18:30:34.170 回答
1

您不需要预处理或讨厌的转义:

$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
于 2013-10-03T18:48:07.890 回答