1

对于遗传分析,我正在尝试将 2-probability file (10gb) 转换为 3-probabilities file 。基本上我必须在每 2 个其他实例之后插入第三列,这第三列可以计算为 1-(第一个实例 + 第二个实例)。你会怎么做?

从:

0.800   0.200   0.000   0.200   0.800   0.200
0.000   0.900   0.000   0.900   0.000   0.900
0.900   0.010   0.900   0.010   0.770   0.010

(该文件包含许多列和行)

0.800   0.200   0.000   0.000   0.200   0.800   0.800   0.200   0.000
0.000   0.900   0.100   0.000   0.900   0.100   0.000   0.900   0.100
0.900   0.010   0.090   0.900   0.010   0.090   0.770   0.010   0.220
4

3 回答 3

2

awk

awk '{for(i=1;i<=NF;i+=2)$(i+1)=$(i+1)OFS sprintf("%.3f",1-$(i+1)-$i)}1' OFS='\t' file
0.800   0.200   0.000   0.000   0.200   0.800   0.800   0.200   0.000
0.000   0.900   0.100   0.000   0.900   0.100   0.000   0.900   0.100
0.900   0.010   0.090   0.900   0.010   0.090   0.770   0.010   0.220
于 2013-03-20T13:45:10.513 回答
1
#! /usr/bin/env perl

use strict;
use warnings;

*ARGV = *DATA;  # for demo only

while (<>) {
  chomp;

  my @fields = split;
  my @output;
  while (@fields >= 2) {
    my($x,$y) = splice @fields, 0, 2;

    push @output, $x, $y, sprintf "%.3f", 1.0 - ($x + $y);
  }

  print join(" " x 3, @output, @fields), "\n";
}

__DATA__
0.800   0.200   0.000   0.200   0.800   0.200
0.000   0.900   0.000   0.900   0.000   0.900
0.900   0.010   0.900   0.010   0.770   0.010

输出:

0.800 0.200 0.000 0.000 0.200 0.800 0.800 0.200 0.000
0.000 0.900 0.100 0.000 0.900 0.100 0.000 0.900 0.100
0.900 0.010 0.090 0.900 0.010 0.090 0.770 0.010 0.220
于 2013-03-20T13:47:30.543 回答
1
#!/usr/bin/perl
use strict; use warnings;

my $template = join "\t", ("%.3f")x3;

while (<>) {
  my @fields = split;
  @fields % 2 == 0 or die "Uneven number of fields";
  while (my ($x, $y) = splice @fields, 0, 2) {
    printf $template, $x, $y, 1 - ($x + $y);
    print  @fields ? "\t" : "\n";
  }
}

用法:perl script.pl <input >output-file

于 2013-03-20T13:56:14.467 回答