0

我有一个需要对齐的两个制表符分隔的文件。例如:

File 1:      File 2:
AAA 123      BBB 345
BBB 345      CCC 333
CCC 333      DDD 444

(这些是大文件,可能有数千行!)

我想做的是让输出看起来像这样:

AAA 123
BBB 345  BBB 345
CCC 333  CCC 333
         DDD 444

最好我想在 perl 中执行此操作,但不确定如何。任何帮助将不胜感激。

4

4 回答 4

1

如果它只是关于制作数据结构,这可能很容易。

#!/usr/bin/env perl

# usage: script.pl file1 file2 ...

use strict;
use warnings;

my %data;
while (<>) {
  chomp;
  my ($key, $value) = split;
  push @{$data{$key}}, $value;
}

use Data::Dumper;
print Dumper \%data;

然后,您可以以您喜欢的任何格式输出。如果它真的是关于完全按原样使用文件,那么它有点棘手。

于 2012-05-03T20:13:49.800 回答
0

正如 ikegami 所提到的,它假定文件的内容按您的示例所示排列。

use strict;
use warnings;

open my $file1, '<file1.txt' or die $!;
open my $file2, '<file2.txt' or die $!;

my $file1_line = <$file1>;
print $file1_line;

while ( my $file2_line = <$file2> ) {
    if( defined( $file1_line = <$file1> ) ) {
        chomp $file1_line;
        print $file1_line;
    }

    my $tabs = $file1_line ? "\t" : "\t\t";
    print "$tabs$file2_line";
}

close $file1;
close $file2;

查看您的示例,您在两个文件中显示了一些相同的键/值对。鉴于此,您似乎想要显示文件 1 独有的对、文件 2 独有的对,并显示常见的对。如果是这种情况(并且您没有尝试通过键或值匹配文件对),您可以use List::Compare:

use strict;
use warnings;
use List::Compare;

open my $file1, '<file1.txt' or die $!;
my @file1 = <$file1>;
close $file1;

open my $file2, '<file2.txt' or die $!;
my @file2 = <$file2>;
close $file2;

my $lc = List::Compare->new(\@file1, \@file2);

my @file1Only = $lc->get_Lonly; # L(eft array)only
for(@file1Only) { print }

my @bothFiles = $lc->get_intersection;
for(@bothFiles) { chomp; print "$_\t$_\n" }

my @file2Only = $lc->get_Ronly; # R(ight array)only
for(@file2Only) { print "\t\t$_" }
于 2012-05-03T20:01:56.093 回答
0

假设文件已排序,

sub get {
   my ($fh) = @_;
   my $line = <$fh>;
   return () if !defined($line);
   return split(' ', $line);
}

my ($key1, $val1) = get($fh1);
my ($key2, $val2) = get($fh2);

while (defined($key1) && defined($key2)) {
   if ($key1 lt $key2) {
       print(join("\t", $key1, $val1), "\n");
       ($key1, $val1) = get($fh1);
   }
   elsif ($key1 gt $key2) {
       print(join("\t", '', '', $key2, $val2), "\n");
       ($key2, $val2) = get($fh2);
   }
   else {
       print(join("\t", $key1, $val1, $key2, $val2), "\n");
       ($key1, $val1) = get($fh1);
       ($key2, $val2) = get($fh2);
   }
}

while (defined($key1)) {
   print(join("\t", $key1, $val1), "\n");
   ($key1, $val1) = get($fh1);
}

while (defined($key2)) {
   print(join("\t", '', '', $key1, $val1), "\n");
   ($key2, $val2) = get($fh2);
}
于 2012-05-03T19:21:33.240 回答
0

类似于 Joel Berger 的回答,但这种方法允许您跟踪文件是否包含给定的密钥:

my %data;

while (my $line = <>){
    chomp $line;
    my ($k)          = $line =~ /^(\S+)/;
    $data{$k}{line}  = $line;
    $data{$k}{$ARGV} = 1;
}

use Data::Dumper;
print Dumper(\%data);

输出:

$VAR1 = {
  'CCC' => {
    'other.dat' => 1,
    'data.dat' => 1,
    'line' => 'CCC 333'
  },
  'BBB' => {
    'other.dat' => 1,
    'data.dat' => 1,
    'line' => 'BBB 345'
  },
  'DDD' => {
    'other.dat' => 1,
    'line' => 'DDD 444'
  },
  'AAA' => {
    'data.dat' => 1,
    'line' => 'AAA 123'
  }
};
于 2012-05-03T20:26:52.840 回答