我是 Perl 入门课程的学生,正在寻找有关我编写一个分析原子数据的小(但棘手)程序的方法的建议和反馈。我的教授鼓励论坛。我不熟悉 Perl 子程序或模块(包括 Bioperl),因此请将响应限制在适当的“初学者级别”,以便我可以理解并从您的建议和/或代码中学习(也请限制“魔术”)。
该计划的要求如下:
从命令行读取一个文件(包含关于原子的数据)并创建一个原子记录数组(每个换行一个记录/原子)。对于每条记录,程序需要存储:
• 原子的序列号(第 7 - 11 列)
• 其所属氨基酸的三字母名称(第 18 - 20 列)
• 原子的三个坐标(x,y,z)(第 31 - 54 列)
•原子的一个或两个字母的元素名称(例如 C、O、N、Na)(第 77-78 列)提示三个命令之一:频率、长度、密度 d(d 是某个数字):
• freq - 文件中有多少每种类型的原子(例如,氮、钠等将显示如下: N:918 S:23
• length - 坐标之间的距离
• 密度 d(其中 d 是一个数字) - 程序将提示输入文件的名称以保存计算并将包含该原子与每个其他原子之间的距离。如果该距离小于或等于数字 d,它会增加原子数的计数在那个距离内,除非文件中的计数为零。输出看起来像:
1:5
2:3
3:6
...(非常大的文件),完成后将关闭。
我正在寻找有关我在下面的代码中编写(和需要编写)的内容的反馈。我特别感谢有关如何编写我的潜艇的任何反馈。我在底部包含了示例输入数据。
我看到的程序结构和功能描述:
$^W = 1; # turn on warnings
use strict; # behave!
my @fields;
my @recs;
while ( <DATA> ) {
chomp;
@fields = split(/\s+/);
push @recs, makeRecord(@fields);
}
for (my $i = 0; $i < @recs; $i++) {
printRec( $recs[$i] );
}
my %command_table = (
freq => \&freq,
length => \&length,
density => \&density,
help => \&help,
quit => \&quit
);
print "Enter a command: ";
while ( <STDIN> ) {
chomp;
my @line = split( /\s+/);
my $command = shift @line;
if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
print "Command must be: freq, length, density or quit\n";
}
else {
$command_table{$command}->();
}
print "Enter a command: ";
}
sub makeRecord
# Read the entire line and make records from the lines that contain the
# word ATOM or HETATM in the first column. Not sure how to do this:
{
my %record =
(
serialnumber => shift,
aminoacid => shift,
coordinates => shift,
element => [ @_ ]
);
return\%record;
}
sub freq
# take an array of atom records, return a hash whose keys are
# distinct atom names and whose values are the frequences of
# these atoms in the array.
sub length
# take an array of atom records and return the max distance
# between all pairs of atoms in that array. My instructor
# advised this would be constructed as a for loop inside a for loop.
sub density
# take an array of atom records and a number d and will return a
# hash whose keys are atom serial numbers and whose values are
# the number of atoms within that distance from the atom with that
# serial number.
sub help
{
print "To use this program, type either\n",
"freq\n",
"length\n",
"density followed by a number, d,\n",
"help\n",
"quit\n";
}
sub quit
{
exit 0;
}
# truncating for testing purposes. Actual data is aprox. 100 columns
# and starts with ATOM or HETATM.
__DATA__
ATOM 4743 CG GLN A 704 19.896 32.017 54.717 1.00 66.44 C
ATOM 4744 CD GLN A 704 19.589 30.757 55.525 1.00 73.28 C
ATOM 4745 OE1 GLN A 704 18.801 29.892 55.098 1.00 75.91 O