I have this pdb file and I want to calculate the distance between the atom 7 and 8 ($2) with the atoms 12,14,15,17 and 18. If the distance is lower than 5 angstrons, the value should be printed
ATOM 1 N ASN p 140 38.455 18.232 -3.207 1.00 7.39 N
ATOM 2 CA ASN p 140 37.856 18.151 -4.534 1.00 7.91 C
ATOM 3 C ASN p 140 38.700 18.848 -5.595 1.00 10.75 C
ATOM 4 O ASN p 140 39.797 19.271 -5.313 1.00 9.25 O
ATOM 5 CB ASN p 140 36.435 18.715 -4.446 1.00 7.62 C
ATOM 6 CG ASN p 140 35.556 17.898 -3.501 1.00 6.82 C
ATOM 7 OD1 ASN p 140 35.269 18.315 -2.323 1.00 8.53 O
ATOM 8 ND2 ASN p 140 35.197 16.691 -3.945 1.00 5.41 N
TER 9 ASN 140
HETATM 10 C 08H p 1 29.121 15.727 -1.182 1.00 5.89 C
HETATM 11 C 08H p 1 29.763 16.230 -0.040 1.00 5.86 C
HETATM 12 N 08H p 1 31.023 16.810 -0.046 1.00 6.15 N
HETATM 13 C 08H p 1 31.533 17.872 0.633 1.00 6.24 C
HETATM 14 N 08H p 1 32.815 18.037 0.299 1.00 6.83 N
HETATM 15 N 08H p 1 33.151 17.112 -0.526 1.00 7.37 C
HETATM 16 C 08H p 1 32.058 16.349 -0.758 1.00 7.06 C
HETATM 17 O 08H p 1 31.956 15.215 -1.730 1.00 8.15 O
HETATM 18 N 08H p 1 30.979 15.691 -2.746 1.00 10.31 N
HETATM 19 C 08H p 1 29.651 15.777 -2.509 1.00 6.71 C
HETATM 20 O HOH p 170 34.699 19.032 2.134 1.00 6.42 O
Based on a similar script, I wrote this code
# usage: awk -f test.awk structure.pdb
BEGIN{print "asparagine and ligand in the structure..."; ORS=""}
$1=="ATOM" && $3~"ND2|OD1" && $4=="ASN" || $1=="HETATM" && $12~"N|O" && $4!~"HOH" {
print $2,$3,$4,$6"\n"
atm_x[$2]=$7; atm_y[$2]=$8; atm_z[$2]=$9
}
END{ ORS="\n"
for (key1 in atm_x) { list=list" "key1
for (key2 in atm_x) {
if (index(list, key2) != 0 ) continue
dx=atm_x[key1]-atm_x[key2]
dy=atm_y[key1]-atm_y[key2]
dz=atm_z[key1]-atm_z[key2]
distance=sqrt(dx^2+dy^2+dz^2)
if (distance < 5 && distance != 0 ) {
i++
candidate[i]=key1"-"key2": "distance
}
}
}
print "\nCandidates ..."
for (keys in candidate) {print candidate[keys]}
}
when I run this script I get the following result
asparagine and ligand in the structure...
7 OD1 ASN 140
8 ND2 ASN 140
12 N 08H 1
14 N 08H 1
17 O 08H 1
18 N 08H 1
Candidates ...
7-8: 2.2964
7-14: 3.60198
7-17: 4.57576
8-17: 4.19391
8-18: 4.49768
12-14: 2.19905
12-17: 2.50007
12-18: 2.92303
14-17: 3.58028
14-18: 4.25989
17-18: 1.48774
The problem is that I don't want to print the distances when the atoms have the same residue name ($4). I'm new to awk and was wondering what's the best way to handle this. Any suggestions would be appreciated!!