0

我需要在编辑我的 awk 脚本时寻求您的帮助。这是原始版本:

BEGIN { printf ("CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1\n")
maxatoms=1000
natom=0
found_struct = 0
found_bond = 0
}
{
if( NF == 5 )
{
foundff=0
natom++
fftype[natom]="UNKNOWN"
if ($1 ~ /CT/)
{
fftype[natom] = "C"
foundff=1
}
else if ($1 ~ /OH/)
{
fftype[natom] = "O"
foundff=1
}
else if ($1 ~ /HC/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /N/)
{
fftype[natom] = "N"
foundff=1
}

else if ($1 ~ /H1/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /HO/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 = "C")
{
fftype[natom] = "C"
foundff=1
}
else if ($1 = "O")
{
fftype[natom] = "O"
foundff=1
}

next

x[natom] = $1
y[natom] = $2
z[natom] = $3


if (foundff == 0)
printf("PROBLEM : Atom ff type %s not known\n", $6)
}

}

END {
for (iatom=1; iatom <= natom; iatom++)
{
printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,
iatom, fftype[iatom], iatom, x[iatom], y[iatom], z[iatom])
}
printf ("END\n")
}

这是我正在使用的文件类型。

0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861

我想把它作为输出:

CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861

但是坐标并没有很好地恢复(CT_1 1 12.011000 0.061000 1.087513 之后的下一行)。您能否看一下并提出任何解决方案。

4

3 回答 3

1

我不会getline尝试这个:

awk '/^(H[1C0]|N|C|O)/{printf "HETATM %d %s %d ",++i,substr($1,1,1),i;p=1;next}p' file
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861

只需添加BEGIN块以打印标题,您就应该被排序。

BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
于 2013-04-17T10:41:13.580 回答
1

不太清楚您想如何处理“原子”,但getline如果找到 a ,我可能会建议使用该命令获取下一行CT_1。因此,如果找到一行,您可以立即处理。从描述中不清楚第一个字段是否包含 a_和其后的数字。我假设其中有一个_

像这样的东西:

awk 'BEGIN { print "CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1" }
NR < 6 {next}
/^(CT|OH|HC|N|H1|HO|C|O)_/{a=$1;getline;++n;print "HETATM",n,substr(a,1,1),n,$1,$2,$3;next}
{ print "Bad line! ("$0")" }
' <<EOT
0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
OH_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
HC_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
QW_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861
EOT

输出:

CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861
HETATM 4 O 3 -5.804493575 4.589489777 -8.369482861
HETATM 5 H 3 -5.804493575 4.589489777 -8.369482861
Bad line! (QW_1 3 12.011000 0.061000 0.581330)
Bad line! (-5.804493575 4.589489777 -8.369482861)
于 2013-04-17T09:11:20.850 回答
0
 perl -ane ' if ($printNow == 1) {printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,$i,$type,$i,$F[0],$F[1],$F[2]);$printNow =0;}; if (scalar @F == 5 and (/^CT/ or /^OH/ or /^HC/ or /^N/)) {$i++; $printNow =1 ; $type =substr($_,0,1)}' filename

希望这有效 +

于 2013-04-17T10:28:15.687 回答