每个人的另一个问题。重申一下,我对 Perl 过程非常陌生,我为犯下愚蠢的错误提前道歉
我正在尝试计算不同长度 DNA 序列的 GC 含量。该文件采用以下格式:
>gene 1
DNA sequence of specific gene
>gene 2
DNA sequence of specific gene
...etc...
这是文件的一小部分
>env
ATGCTTCTCATCTCAAACCCGCGCCACCTGGGGCACCCGATGAGTCCTGGGAA
我已经建立了计数器并读取了 DNA 序列的每一行,但目前它正在对所有行的总数进行运行总和。我希望它读取每个序列,在读取序列后打印内容,然后移动到下一个。每行都有单独的碱基计数。
这就是我到目前为止所拥有的。
#!/usr/bin/perl
#necessary code to open and read a new file and create a new one.
use strict;
my $infile = "Lab1_seq.fasta";
open INFILE, $infile or die "$infile: $!";
my $outfile = "Lab1_seq_output.txt";
open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!";
#establishing the intial counts for each base
my $G = 0;
my $C = 0;
my $A = 0;
my $T = 0;
#initial loop created to read through each line
while ( my $line = <INFILE> ) {
chomp $line;
# reads file until the ">" character is encounterd and prints the line
if ($line =~ /^>/){
print OUTFILE "Gene: $line\n";
}
# otherwise count the content of the next line.
# my percent counts seem to be incorrect due to my Total length counts skewing the following line. I am currently unsure how to fix that
elsif ($line =~ /^[A-Z]/){
my @array = split //, $line;
my $array= (@array);
# reset the counts of each variable
$G = ();
$C = ();
$A = ();
$T = ();
foreach $array (@array){
#if statements asses which base is present and makes a running total of the bases.
if ($array eq 'G'){
++$G;
}
elsif ( $array eq 'C' ) {
++$C; }
elsif ( $array eq 'A' ) {
++$A; }
elsif ( $array eq 'T' ) {
++$T; }
}
# all is printed to the outfile
print OUTFILE "G:$G\n";
print OUTFILE "C:$C\n";
print OUTFILE "A:$A\n";
print OUTFILE "T:$T\n";
print OUTFILE "Total length:_", ($A+=$C+=$G+=$T), "_base pairs\n";
print OUTFILE "GC content is(percent):_", (($G+=$C)/($A+=$C+=$G+=$T)*100),"_%\n";
}
}
#close the outfile and the infile
close OUTFILE;
close INFILE;
我再次觉得我走在正确的道路上,我只是缺少一些基本的基础。任何帮助将不胜感激。
最后一个问题是打印出来的最终计数。我的百分比值是错误的,并且给了我错误的值。我觉得正在计算总数,然后将新值合并到总数中。