linux - 是否有这样的命令可以在 shell 中合并多个文件？

Question

例如，有 5 个数字 => [1,2,3,4,5] 和 3 个组

文件 1（组 1）：

1
3
5

文件 2（组 2）：

3
4

文件 3（组 3）：

1
5

输出（column1：是否在Group1，column2：是否在Group2，column3：是否在Group3 [NA表示不..]）：

1 NA 1
3 3 NA
NA 4 NA
5 NA 5

或类似的东西（+ 表示在，- 表示不）：

1 + - +
3 + + -
4 - + -
5 + - +

我试过joinand merge，但看起来它们都不适用于多个文件..（例如，8个文件）

score 2 · Accepted Answer

你说有数字 1-5，但据我所知，这与你想要的输出无关。您只使用在输出中的文件中找到的数字。此代码将执行您想要的操作：

use strict;
use warnings;
use feature 'say';

my @hashes;
my %seen;
local $/;   # read entire file at once
while (<>) {
    my @nums = split;                          # split file into elements
    $seen{$_}++ for @nums;                     # dedupe elements
    push @hashes, { map { $_ => $_ } @nums };  # map into hash
}

my @all = sort { $a <=> $b } keys %seen;       # sort deduped elements
# my @all = 1 .. 5;                            # OR: provide hard-coded list

for my $num (@all) {                           # for all unique numbers
    my @fields;
    for my $href (@hashes) {                   # check each hash
        push @fields, $href->{$num} // "NA";   # enter "NA" if not found
    }
    say join "\t", @fields;                    # print the fields
}

您可以将排序的重复数据删除列表替换为@all仅my @all = 1 .. 5或任何其他有效列表。然后它将为这些数字添加行，并为缺失值打印额外的“NA”字段。

您还应该知道，这取决于您的文件内容是数字这一事实，但仅限于@all数组的排序，因此如果您将其替换为您自己的列表或您自己的排序例程，您可以使用任何值。

该脚本将获取任意数量的文件并对其进行处理。例如：

$ perl script.pl f1.txt f2.txt f3.txt
1       NA      1
3       3       NA
NA      4       NA
5       NA      5

感谢Brent Stewart弄清了 OP 的含义。

score 0 · Accepted Answer

对于两个文件，您可以轻松使用join如下所示（假设file1并file2已排序）：

$ join -e NA -o 1.1,2.1 -a 1 -a 2  file1 file2
1 NA
3 3
NA 4
5 NA

如果你有两个以上的文件，它会变得更加复杂。

这是一个蛮力grep解决方案：

#!/bin/bash
files=(file1 file2 file3)
sort -nu "${files[@]}" | while read line; do
    for f in "${files[@]}"; do   
         if grep -qFx "$line" "$f"; then
             printf "${line}\t"
         else
             printf "NA\t"
         fi
    done
    printf "\n"
done

输出：

1       NA      1
3       3       NA
NA      4       NA
5       NA      5

score 0 · Accepted Answer

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;

my @lines;
my $filecount = 0;

# parse
for my $filename (@ARGV){
  open my $fh, '<', $filename;
  while( my $line = <$fh> ){
    chomp($line);
    next unless length $line;
    $lines[$line][$filecount]++;
  }
  close $fh;
}continue{
  $filecount++;
}

# print
for my $linenum ( 1..$#lines ){
  my $line = $lines[$linenum];
  next unless $line;

  print ' ' x (5-length $linenum), $linenum, ' ';

  for my $elem( @$line ){
    print $elem ? 'X' : ' '
  }
  print "\n";
}

score 0 · Accepted Answer

如果您的输入文件是单调递增的，并且正如您的输入样本所建议的那样，每行仅包含一个整数，您可以简单地预处理输入文件并使用粘贴：

for i in file{1,2,3}; do  # List input files
  awk '{ a += 1; while( $1 > a ) { print "NA"; a += 1 }} 1' $i > $i.out
done
paste file{1,2,3}.out

这会使某些列中的尾随条目为空。修复这个问题留给读者作为练习。

linux - 是否有这样的命令可以在 shell 中合并多个文件？

4 回答 4

Related

Reference