1

我有一个未填充的列表哈希。

我检查了最后添加到散列的块实际上是在输入时调用的。如果键不存在,它应该添加一个单例列表,或者如果存在则推到列表的后面(在右键下引用)。

我知道 GOTO 很丑陋,但我已将其注释掉,它没有任何效果。

问题是当 printhits 被调用时,什么都没有被打印出来,就好像散列中没有值一样。我也尝试了每个(%genomehits),没有骰子。

谢谢!

#!/usr/bin/perl
use strict;
use warnings;

my $len = 11; # resolution of the peaks

#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift @ARGV or die $usage;
my $call = shift @ARGV or die $usage;
my $therest = join(" ",@ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}

# initialize master hash
my %genomehits = ();

foreach (split ';', $input) {
    my $mygenename = "err_naming";
    if ($_ =~ /^(.+)-/) {$mygenename = $1;}

    open (INPUT, $_);
    my @wiggle = <INPUT>;

    &singlegene(\%genomehits, \@wiggle, $mygenename);

    close (INPUT);
}

&printhits;

#print %genomehits;
sub printhits {
    foreach my $key (%genomehits) {
        print "key: $key , values: ";
    foreach (@{$genomehits{$key}}) {
        print $_ . ";";
    }
    print "\n";
    }
}

sub singlegene {
 # let %hash be the mapping hash
 # let @mygene be the gene to currently process
 # let $mygenename be the name of the gene to currently process

    my (%hash) = %{$_[0]};
    my (@mygene) = @{$_[1]};
    my $mygenename = $_[2];

    my $chromosome;
    my $leftbound = -2;
    my $rightbound = -2;

    foreach (@mygene) {
        #print "Doing line ". $_ . "\n";

        if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}

        if ($_ =~ "Step") {
            if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
            if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
            $leftbound = -2;
            $rightbound = -2;
            next;
        }

        my @line = split /\t/, $_;
        my $pos = $line[0];
        my $val = $line[-1];

        # above threshold for a call
        if ($val >= $call) {
            # start of range
            if ($rightbound != ($pos - 1)) {
                $leftbound = $pos;
                $rightbound = $pos;
            }
            # middle of range, increment rightbound
            else {
                $rightbound = $pos;
            }

            if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
        }
        # else reinitialize: not a call
        else {
            FORTHELASTONE:
            # typical case, in an ocean of OFFs
            if ($rightbound != ($pos-1)) {
                $leftbound = $pos;
            }
            else {
            # register the range
                my $range = $rightbound - $leftbound;
                for ($spread) {
                    $leftbound -= $len;
                    $rightbound += $len;
                }
                #print $range . "\n";

                foreach ($leftbound .. $rightbound) {
                    my $key = "$chromosome:$_";
                    if (not defined $hash{$key}) {
                        $hash{$key} = [$mygenename];
                    }
                    else { push @{$hash{$key}}, $mygenename; }
                }
            }
        }

    }

}
4

2 回答 2

4

您正在传递%genomehits对函数的引用singlegene,然后在执行时将其复制到新的散列中my (%hash) = %{$_[0]};。然后添加%hash在函数末尾消失的值。

要修复它,请直接使用带有箭头符号的引用。例如

my $hash = $_[0];
...
$hash->{$key} = yadda yadda;
于 2009-06-09T15:29:18.633 回答
2

我认为是这一行:

my (%hash) = %{$_[0]};

您正在传递一个引用,但此语句正在制作您的哈希值的副本。当您返回时,您在单基因中添加的所有内容都会丢失。

将其保留为哈希引用,它应该可以工作。

PS - Data::Dumper 在大型数据结构未按预期运行时是您的朋友。我会在你的代码中撒上一些......

use Data::Dumper; print Dumper \%genomehash;

于 2009-06-09T15:26:10.747 回答