0

If I have an input file below, is there any command/way in Linux to convert this into my desired file as followed?

Input file:

Column_1     Column_2  
scaffold_A   SNP_marker1
scaffold_A   SNP_marker2
scaffold_A   SNP_marker3
scaffold_A   SNP_marker4
scaffold_B   SNP_marker5
scaffold_B   SNP_marker6
scaffold_B   SNP_marker7
scaffold_C   SNP_marker8
scaffold_A   SNP_marker9
scaffold_A   SNP_marker10

Desired Output file:

Column_1     Column_2  
scaffold_A   SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4
scaffold_B   SNP_marker5;SNP_marker6;SNP_marker7
scaffold_C   SNP_marker8
scaffold_A   SNP_marker9;SNP_marker10

I was thinking of using grep, uniq, etc, but still couldn't figure out how to get this done.

4

5 回答 5

2

python解决方案(假设文件名在命令行传入)

from __future__ import print_function #not needed with Python3
with open('infile') as infile, open('outfile', 'w') as outfile:
    outfile.write(infile.readline()) # transfer the header
    col_one, col_two = infile.readline().split()
    col_two = [col_two] # make it a list
    for line in infile:
        data = line.split()
        if col_one != data[0]:
            print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
            col_one = data[0]
            col_two = [data[1]]
        else:
            col_two.append(data[1])
    print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
于 2013-07-24T13:50:59.900 回答
2

Perl 解决方案:

perl -lane 'sub output {
                print "$last\t", join ";", @buff;
            }
            $last //= $F[0];
            if ($F[0] ne $last) {
               output();
               undef @buff;
               $last = $F[0];
            }
            push @buff, $F[1];
            }{ output();'
于 2013-07-24T11:37:42.660 回答
0

bash 脚本中的 awk 解决方案

#!/bin/bash 

awk '
BEGIN{
    str = ""
}
{
    if ( str != $1 ) {
        if ( NR != 1 ){
            printf("\n")
        }
        str = $1
        printf("%s\t%s",$1,$2)
    } else if ( str == $1 ) {
        printf(";%s",$2)
    }
}
END{
        printf("\n")
}' your_file.txt
于 2013-07-24T13:19:28.447 回答
0

您也可以在 bash 中尝试以下解决方案:

cat input.txt | while read L; do y=`echo $L | cut -f1 -d' '`; { test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; } || { x="$y";echo -en "\n$L"; }; done

或以人类更易读的形式进行审查:

cat input.txt | while read L;
do
  y=`echo $L | cut -f1 -d' '`;
  {
    test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`";
  } || 
  {
    x="$y";echo -en "\n$L"; 
  };
done

请注意,脚本执行结果中格式良好的输出是基于bash echo命令的。

于 2013-07-31T11:31:36.673 回答
0

如果你不介意使用 Python,它有itertools.groupby,它可以达到这个目的:

# file: comebine.py
import itertools

with open('data.txt') as f:
    data = [row.split() for row in f]

for column1, rows_group in itertools.groupby(data, key=lambda row: row[0]):
    print column1, ';'.join(column2 for column1, column2 in rows_group)

将此脚本另存为combine.py。假设您的输入文件在data.txt中,运行它以获得您想要的输出:

python combine.py

讨论

  • with open(...)块的结果是data一个行列表,每一行本身就是一个列列表。
  • itertools.groupby函数接受一个可迭代的,在这种情况下是一个列表。您告诉它如何使用一个键(column1)将行组合在一起。
  • rows_group 是共享同一列的行的列表1
于 2013-08-02T16:03:52.810 回答