0

参考Calculating the distance between atomic coordinates的问题,输入为

ATOM    920  CA  GLN A 203      39.292 -13.354  17.416  1.00 55.76           C 
ATOM    929  CA  HIS A 204      38.546 -15.963  14.792  1.00 29.53           C
ATOM    939  CA  ASN A 205      39.443 -17.018  11.206  1.00 54.49           C  
ATOM    947  CA  GLU A 206      41.454 -13.901  10.155  1.00 26.32           C
ATOM    956  CA  VAL A 207      43.664 -14.041  13.279  1.00 40.65           C 
.
.
.

ATOM    963  CA  GLU A 208      45.403 -17.443  13.188  1.00 40.25           C  

一个答案报告为

use strict;
use warnings;

my @line;
while (<>) {
    push @line, $_;            # add line to buffer
    next if @line < 2;         # skip unless buffer is full
    print proc(@line), "\n";   # process and print 
    shift @line;               # remove used line 
}

sub proc {
    my @a = split ' ', shift;   # line 1
    my @b = split ' ', shift;   # line 2
    my $x = ($a[6]-$b[6]);      # calculate the diffs
    my $y = ($a[7]-$b[7]);
    my $z = ($a[8]-$b[8]);
    my $dist = sprintf "%.1f",                # format the number
                   sqrt($x**2+$y**2+$z**2);   # do the calculation
    return "$a[3]-$b[3]\t$dist"; # return the string for printing
}

上面代码的输出是第一个 CA 到第二个 CA 和第二个到第三个 CA 之间的距离,依此类推......

如何修改此代码以查找第一个 CA 到其余 CA(2、3、..)以及从第二个 CA 到其余 CA(3、4、..)之间的距离,依此类推并仅打印那些小于 5 埃?我发现push @line, $_;应该更改该语句以增加数组大小,但不清楚如何做到这一点。

4

2 回答 2

1

可以试试这个:

use strict;
use warnings;

my @alllines = ();
while(<DATA>) {  push(@alllines, $_);  }

#Each Current line
for(my $i=0; $i<=$#alllines+1; $i++)
{
    #Each Next line 
    for(my $j=$i+1; $j<=$#alllines; $j++)
    {
        if($alllines[$i])
        {
            #Split the line into tab delimits
            my ($line1_tb_1,$line1_tb_2,$line1_tb_3) = split /\t/, $alllines[$i];
            print "Main_Line: $line1_tb_1\t$line1_tb_2\t$line1_tb_3";
            if($alllines[$j])
            {
                #Split the line into tab delimits
                my ($line_nxt_tb1,$line_nxt_tb2,$line_nxt_tb3) = split /\t/, $alllines[$j];

                print "Next_Line: $line_nxt_tb1\t$line_nxt_tb2\t$line_nxt_tb3";

                #Do it your coding/regex here
            }
        }
        #system 'pause'; Testing Purpose!!!
    }
}

__DATA__
tab1    123 456
tab2    789 012
tab3    345 678
tab4    901 234
tab5    567 890

我希望这能帮到您。

于 2016-12-16T07:35:17.290 回答
1

要获取这些对,请将文件读入数组,@data_array. 然后遍历条目。

更新:添加了文件打开和加载@data_array。

open my $fh, '<', 'atom_file.pdb' or die $!;

my @data_array = <$fh>;

close $fh or die $!;

for my $i (0 .. $#data_array) {
    for my $j ($i+1 .. $#data_array) {
        process(@data_array[$i,$j]);    
    }   
}
于 2016-12-15T23:25:32.470 回答