bash - Shell or AWK script to group duplicate entry using first field and find difference using last field

Question

I want to write a script (shell script or awk) to print $1 (first field) of duplicate entry all to gather and then use the last value to find difference between last and first entry or may be note the difference of value at every duplicate entry.

For example my files has following entries:

counter1 is 100
counter2 is 200
counter3 is 300
counter1 is 1000
counter2 is 2000
counter3 is 3000
counter1 is 10000
counter2 is 20000
counter3 is 30000

I want to print:

counter1 is 100
counter1 is 1000
counter1 is 10000
counter2 is 200
counter2 is 2000
counter2 is 20000
counter3 is 300
counter3 is 3000
counter3 is 30000

Now each counter has some value incremented so I want to find difference between each value of same counter:

counter1 is 100
counter1 is 1000 | difference 1000-100 = 900
counter1 is 10000| difference 10000-100= 9900

I was able to print duplicate entries but not able to bunch them, its appearing in the same sequence as file has.

MacBook-Air:linuxscripts jimdev$ awk 'NR==FNR && a[$1]++ {b[$1];next} $1 in b' FS=" " countr.txt countr.txt 

counter1 is 100
counter2 is 200
counter3 is 300
counter1 is 1000
counter2 is 2000
counter3 is 3000
counter1 is 10000
counter2 is 20000
counter3 is 30000

score 2 · Accepted Answer

does this work for you?

sort countr.txt | grep -v '^$'  | awk '
BEGIN { field1="different" ; firstval="0" ; }
     $1 !~ field1 { print $0 ; field1 = $1 ; firstval = $NF ; continue;}
     $1  ~ field1 { print $0 " | difference " $NF "-" firstval " = " $NF-firstval ; }'

and here is the output on the input file as shown in your post :

counter1 is 100
counter1 is 1000 | difference 1000-100 = 900
counter1 is 10000 | difference 10000-100 = 9900
counter2 is 200
counter2 is 2000 | difference 2000-200 = 1800
counter2 is 20000 | difference 20000-200 = 19800
counter3 is 300
counter3 is 3000 | difference 3000-300 = 2700
counter3 is 30000 | difference 30000-300 = 29700

score 1 · Accepted Answer

Assuming that your data is in a file called data.txt. You can get it with a sort and an awk simple if (or using patterns):

sort data.txt | awk 'BEGIN{last = ""; value = 0;} {if ($1 == last) {print $1" is "$3" | difference "$3"-"value" = "($3-value)}else{last = $1; value = $3; print $1" is "$3;}}' -

Explanation: first of all sort the input to have "counters" in ascending order. Then here we go with the AWK expression:

We use 2 temporal variables: last, that stores the current counter, and the value of the first counter. We initialize it in the BEGIN part of the AWK script: BEGIN{last = ""; value = 0;}.

Now, for each line we execute the following code:

if ($1 == last) {
    print $1" is "$3" | difference "$3"-"value" = "($3-value);
} else {
    last = $1;
    value = $3;
    print $1" is "$3;
}

line 1: compare the first field (counter) to last, that stores the last counter tag in order to know if we should print the difference.

line 2: if the current line has the same counter tag than the previous one we print the difference.

line 3: Else, it is a base case, so we save the current counter tag in order to compare with the next line, its value to calculate the difference and we print the line.

If the new line has the same that same counter tag than the previous one, we keep the values and calculate the difference to the first value of this counter. Else, we store the new counter tag (in last variable) and its value (in value) and we just print the line.

Here is the output for your input sample:

counter1 is 100
counter1 is 1000 | difference 1000-100 = 900
counter1 is 10000 | difference 10000-100 = 9900
counter2 is 200
counter2 is 2000 | difference 2000-200 = 1800
counter2 is 20000 | difference 20000-200 = 19800
counter3 is 300
counter3 is 3000 | difference 3000-300 = 2700
counter3 is 30000 | difference 30000-300 = 29700

bash - Shell or AWK script to group duplicate entry using first field and find difference using last field

2 回答 2

Related

Reference