0

我有这个输出:

10dvex2_miRNA_ce.out.data|6361
10dvex2_miRNA_ce.out.data|6361
10dvex2_misc_RNA_ce.out.data|0
10dvex2_rRNA_ce.out.data|239

在 Perl 中使用这个脚本:

#!/usr/bin/perl

use warnings;
use strict;

open(MYINPUTFILE, $ARGV[0]); # open for input
my @lines = <MYINPUTFILE>; # read file into list
my $count = 0;
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n";

foreach my $lines (@lines){
my $pattern = $lines;
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2   $3  $5/g;
$count += $5;
print $1."\t".$2.$3."\t".$5."\n";
}
close(MYINPUTFILE);
exit;

我提取此信息:

Frag    ncRNA   Amount
10  miRNAce 6361
10  misc_RNAce  0
10  rRNAce  239

但在金额列中,我想报告这些数字除以总结果(6600)。在这种情况下,我想要这个输出:

Frag    ncRNA   Amount
10  miRNAce 0.964
10  misc_RNAce  0
10  rRNAce  0.036

我的问题是在循环中提取 TOTAL 结果......以规范化这些数据。一些想法?

4

2 回答 2

1

也许以下内容会有所帮助:

use strict;
use warnings;

my ( %hash, $total, %seen, @array );

while (<>) {
    next if $seen{$_}++;
    /(\d+).+?_([^.]+).+\|(\d+)$/;
    $hash{$1}{$2} = $3;
    $total += $3;
}

print "Frag\tncRNA\tAmount\n";

while ( my ( $key1, $val1 ) = each %hash ) {
    while ( my ( $key2, $val2 ) = each %$val1 ) {
        my $frac = $val2 / $total == 0 ? 0 : sprintf( '%.3f', $val2 / $total );
        push @array, "$key1\t$key2\t$frac\n";
    }
}

print map { $_->[0] }
  sort    { $b->[1] <=> $a->[1] }
  map { [ $_, (split)[2] ] }
  @array;

数据集的输出:

Frag    ncRNA   Amount
10  miRNA_ce    0.964
10  rRNA_ce 0.036
10  misc_RNA_ce 0

跳过相同的行,然后从每行中捕获所需的元素。为后续计算保留运行总计。您想要的输出显示从高到低排序,这就是为什么每条记录都被push编辑到@array. 但是,如果不需要排序,您可以只打印该行并省略Schwartzian 变换on @array

希望这可以帮助!

于 2012-11-06T20:47:20.143 回答
1

为此,您需要对数据进行两次传递。

#! /usr/bin/env perl

use warnings;
use strict;

print join("\t",qw'Frag ncRNA Amount'),"\n";

my @data;
my $total = 0;

# parse the lines
while( <> ){
  my @elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/;
  next unless @elem;

  # running total
  $total += $elem[-1];

  # combine $2 and $3
  splice @elem, 1, 2, $2.$3; # $elem[1].$elem[2];

  push @data, \@elem;
}

# print them
for( @data ){
  my @copy = @$_;
  $copy[-1] = $copy[-1] / $total;
  $copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1];
  print join("\t",@copy),"\n";
}
于 2012-11-06T21:00:36.893 回答