unix - awk 删除重复字段并计数

Question

我有一个问题，我试图仅使用 awk 来解决。

我在结构中有一个 csv 文件：

Easting  Northing    Latitude    Longitude   Locality Name

Easting  "Northing"  "Latitude"  "Longitude"     "LocalityName"
364208  176288           51.48441   -2.51685     "Fishponds"
358596  172813           51.45278   -2.59726     "Bristol City Centre"
358886  177828           51.49789   -2.59367     "Southmead"
358839  177839           51.49798   -2.59435     "Southmead"
358980  177882           51.49838   -2.59232     "Southmead"
359009  177863           51.49821   -2.5919          "Southmead"
358839  177529           51.4952        -2.59431     "Southmead"
359475  168262           51.41192   -2.58409     "Hengrove Park"
358945  173526           51.45921   -2.59232     "Bristol"
358943  173525           51.4592    -2.59235     "Bristol"
358941  173524           51.45919   -2.59238     "Bristol"
358940  173523           51.45919   -2.59239     "Bristol"
358945  173528           51.45923   -2.59232     "Bristol"
358936  173520           51.45916   -2.59245     "Bristol"
358936  173521           51.45917   -2.59245     "Bristol"
358932  173516           51.45912   -2.5925          "Bristol"

等等...我正在尝试编写一个 awk 脚本，该脚本将计算 Locality 名称的每个实例并打印出来，因此输出将是：

Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove park 1
Bristol 8

到目前为止，我得到了这个：

BEGIN { i = 0; state = 0; names[NR]; FS=","; }

{
#for each element in names array, check if already exists.
    for(j=0;j<=i;j++)
    {
        if(names[j] == $5)
        {
        state = 1;
        break;
        }
    }
# If the name doesnt already exist add to names array
    if(state == 0)
    {
        names[i] = $5;
        i++;
    }
    state = 0;
}

END { 
    for(x=0;x<=i;x++)
    {
    print names[x];
    }
}

希望对位置进行排序并删除重复项，但我仍然想不出一个好方法来计算每个位置的实例然后将它们列出回来。

score 5 · Accepted Answer

更简单的解决方案：

awk -F '"' 'NR>3 {locname[$2]++}
            END { for (n in locname) {print n, locname[n] } }' INPUTFILE

首先输入文件分隔符设置为"，因此第二个字段将是位置名称。跳过第一行（标题）。利用数组（键是第二个字段）来计算出现次数。在最后一行之后打印数组的键和值。

score 1 · Accepted Answer

这是使用GNU awk. 它解析文件两次，但给出排序的输出：

awk -F "\"" 'NR > 3 && FNR==NR { a[$2]++; next } $2 in a && !b[$2]++ { print $2, a[$2] }' file{,}

结果：

Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove Park 1
Bristol 8

score 0 · Accepted Answer

这可能对您有用：

awk -F\" '/^[0-9]/{if(!location){location=$2};if(location==$2){count++;next};print location,count;location=$2;count=1};END{print location,count}' file

这仅在对位置进行排序（如您的示例中所示）时才有效，否则使用：

awk -F\" '/^[0-9]/{count[$2]++;if(count[$2]==1)location[++order]=$2};END{for(n=1;n<=order;n++)print location[n],count[location[n]]}' file

score 0 · Accepted Answer

perl 解决方案：

perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' your_file

测试如下：

> perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' temp
Bristol 8
Hengrove Park 1
Southmead 5
Bristol City Centre 1
Fishponds 1
>

unix - awk 删除重复字段并计数

4 回答 4

Related

Reference