0

我正在寻找这个问题的解决方案:我有一个文件(制表符分隔),就像我在下面的块引用中显示的那样。如您所见,有些行与第一部分匹配(粗体字段)。

CHR 4 164440449 165354407 G1 P8002-51-75
CHR1 220871675 220962596 G2 P2368-132-84
CHR1 220871675 220962596 G2 P2369-152-116
CHR1 220871675 220962596 G2 P2371-180-82
CHR1 220871675 220962596 G2 P2372-223-129
CHR1 220871675 220962596 G2 P2373 -153-96
chr1 220871675 220962596 G2 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146
chr5 126198405 126416440 G3 P9334-151-116

使用 AWK 或 PERL 我怎么能设法获得以下输出并保留制表符分隔格式???一般的概念是尝试根据它的第一部分统一行,并附加最后一个字段

CHR4 164440449 G1 P8002-51-75
22096207-75 22096207-74 P2371-180-82 P2372-152-82 P2371-182-82 P2371-182-82 P2372-152-89 P2372-104-129 P2372-104-78 P2370-104-78 P2370-104-78CHR5 126198405 126416440 G3 P93733-153-96
P2370-104-78 126416440 G3 P93333-135 -146 P9334-151-116

一般的概念是尝试根据它的第一部分统一行,并附加最后一个字段

4

2 回答 2

2
while (<DATA>) {
    ($x, $y) = /^(.*)\s([-\w]+)$/;
    push @{$hash{$x}}, $y;
}
while (($k, $v) = each %hash) {
    print $k, join("\t", @{$v}), "\n";
}
__DATA__
chr4 164440449 165354407 G1 P8002-51-75
chr1 220871675 220962596 G2 P2368-132-84
chr1 220871675 220962596 G2 P2369-152-116
chr1 220871675 220962596 G2 P2371-180-82
chr1 220871675 220962596 G2 P2372-223-129
chr1 220871675 220962596 G2 P2373-153-96
chr1 220871675 220962596 G2 P2370-104-78
chr5 126198405 126416440 G3 P9333-135-146
chr5 126198405 126416440 G3 P9334-151-116
于 2012-07-09T09:19:51.327 回答
1

一种使用方式perl

perl -ane '
    ## Save all fields but the last one as the key to compare between rows.
    $key = join qq|\t|, @F[ 0 .. $#F - 1 ];

    ## In first line or when current key is equal to previous key, save last
    ## field in an array and stop processing current row.
    if ( $. == 1 || $key eq $pkey ) {
        $pkey = $key;
        push @value, $F[ $#F ];
        next unless eof;
    }

    ## At this point, keys between rows are different, so print previous
    ## key with its values and begin to save the new one.
    printf qq|%s\n|, join qq|\t|, $pkey, @value;
    @value = ();
    push @value, $F[ $#F ];

    ## Exception: Last line with a new key, print it.
    if ( eof && $pkey ne $key ) {
    printf qq|%s\n|, join qq|\t|, $key, @value;
    }

    ## Save previous key.
    $pkey = $key;

' infile

假设infile您的问题的数据,输出将是:

chr4    164440449       165354407       G1      P8002-51-75
chr1    220871675       220962596       G2      P2368-132-84    P2369-152-116   P2371-180-82    P2372-223-129   P2373-153-96    P2370-104-78
chr5    126198405       126416440       G3      P9333-135-146   P9334-151-116
于 2012-07-09T09:14:18.020 回答