0

我正在尝试使用以下脚本来打乱文件中序列(行)的顺序。我不确定如何“初始化”值——请帮忙!

print "Please enter filename (without extension): ";
my $input = <>;
chomp $input;

use strict;
use warnings;

print "Please enter total no. of sequence in fasta file: ";
my $orig_size = <>*2-1;
chomp $orig_size;

open INFILE, "$input.fasta"
   or die "Error opening input file for shuffling!";
open SHUFFLED, ">"."$input"."_shuffled.fasta"
   or die "Error creating shuffled output file!";

my @array  = (0); # Need to initialise 1st element in array1&2 for the shift function
my @array2 = (0);
my $i      = 1;
my $index  = 0;
my $index2 = 0;

while (my @line = <INFILE>){

    while ($i <= $orig_size) { 

        $array[$i] = $line[$index];
        $array[$i] =~ s/(.)\s/$1/seg;

        $index++;
        $array2[$i] = $line[$index];
        $array2[$i] =~ s/(.)\s/$1/seg;

        $i++;
        $index++;
    }
}

my $array  = shift (@array); 
my $array2 = shift (@array2);

for ($i = my $header_size; $i >= 0; $i--) { 

    my $j = int rand ($i+1);
    next if $i == $j;
    @array[$i,$j]  = @array[$j,$i];
    @array2[$i,$j] = @array2[$j,$i];
}

while ($index2 <= my $header_size) { 

    print SHUFFLED "$array[$index2]\n";
    print SHUFFLED "$array2[$index2]\n";
    $index2++;
}
close INFILE;
close SHUFFLED;

我收到以下警告:

Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 27, <INFILE> line 578914.
Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 31, <INFILE> line 578914.
Use of uninitialized value in numeric ge (>=) at fasta_corrector6.pl line 40, <INFILE> line 578914.
Use of uninitialized value in addition (+) at fasta_corrector6.pl line 41, <INFILE> line 578914.
Use of uninitialized value in numeric eq (==) at fasta_corrector6.pl line 42, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 47, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 50, <INFILE> line 578914.

4

3 回答 3

3

首先,您在以下位置读取整个输入文件:

  use IO::File;
  my @lines = IO::File->new($file_name)->getlines;

然后你洗牌:

  use List::Util 'shuffle';
  my @shuffled_lines = shuffle(@lines);

然后你把它们写出来:

  IO::File->new($new_file_name, "w")->print(@shuffled_lines);

Perl FAQ 中有一个关于如何洗牌数组的条目。另一个条目讲述了一次性读取文件的多种方法。Perl FAQs 包含很多关于如何做许多常见事情的示例和琐事——这是一个继续学习更多关于 Perl 的好地方。

于 2012-09-13T16:44:34.350 回答
2

On your previous question I gave this answer, and noted that your code failed because you had not initialized a variable named $header_size used in a loop condition. Not only have you repeated that mistake, you have elaborated on it by starting to declare the variable with my each time you try to access it.

for ($i = my $header_size; $i >= 0; $i--) { 
#         ^^--- wrong!

while ($index2 <= my $header_size) { 
#                 ^^--- wrong!

A variable that is declared with my is empty (undef) by default. $index2 can never contain anything but undef here, and your loop will run only once, because 0 <= undef will evaluate true (albeit with an uninitialized warning).

Please take my advice and set a value for $header_size. And only use my when declaring a variable, not every time you use it.

A better solution

Seeing your errors above, it seems that your input files are rather large. If you have over 500,000 lines in your files, it means your script will consume large amounts of memory to run. It may be worthwhile for you to use a module such as Tie::File and work only with array indexes. For example:

use strict;
use warnings;
use Tie::File;
use List::Util qw(shuffle);

tie my @file, 'Tie::File', $filename or die $!;
for my $lineno (shuffle 0 .. $#file) {
    print $line[$lineno];
}
untie @file; # all done
于 2012-09-13T19:06:02.890 回答
2

I cannot pinpoint what exactly went wrong, but there are a few oddities with your code:

The Diamond Operator

Perl's Diamond operator <FILEHANDLE> reads a line from the filehandle. If no filehandle is provided, each command line Argument (@ARGV) is treated as a file and read. If there are no arguments, STDIN is used. better specify this yourself. You also should chomp before you do arithemtics with the line, not afterwards. Note that strings that do not start with a number are treated as numeric 0. You should check for numericness (with a regex?) and include error handling.

The Diamond/Readline operator is context sensitive. If given in scalar context (e.g, a conditional, a scalar assignment) it returns one line. If given in list context, e.g. as a function parameter or an array assignment, it returns all lines as an array. So

while (my @line = <INFILE>) { ...

will not give you one line but all lines and is thus equivalent to

my @line;
if (@line = <INFILE>) { ...

Array gymnastics

After you read in the lines, you try to do some manual chomping. Here I remove all trailing whitspaces in @line, in a single line:

s/\s+$// foreach @line;

And here, I remove all non-leading whitespaces (what your regex is doing in fact):

s/(?<!^)\s//g foreach @line;

To stuff an element alternatingly into two arrays, this might work as well:

for my $i (0 .. $#@line) {
   if ($i % 2) {
     push @array1, shift @line;
   } else {
     push @array2, shift @line;
   }
}

or

my $i = 0;
while (@line) {
   push ($i++ % 2 ? @array1 : @array2), shift @line
}

Manual bookkeeping of array indices is messy and error-prone.

Your for-loop could be written mor idiomatic as

for my $i (reverse 0 .. $header_size)

Do note that declaring $header_size inside the loop initialisation is possible if it was not declared before, but it will yield the undef value, therefore you assigned undef to $i which leads to some of the error messages, as undef should not be used in arithemtic operations. Assignments always assigns the right side to the left side.

于 2012-09-13T16:36:06.297 回答