我有一个需要对齐的两个制表符分隔的文件。例如:
File 1: File 2:
AAA 123 BBB 345
BBB 345 CCC 333
CCC 333 DDD 444
(这些是大文件,可能有数千行!)
我想做的是让输出看起来像这样:
AAA 123
BBB 345 BBB 345
CCC 333 CCC 333
DDD 444
最好我想在 perl 中执行此操作,但不确定如何。任何帮助将不胜感激。
我有一个需要对齐的两个制表符分隔的文件。例如:
File 1: File 2:
AAA 123 BBB 345
BBB 345 CCC 333
CCC 333 DDD 444
(这些是大文件,可能有数千行!)
我想做的是让输出看起来像这样:
AAA 123
BBB 345 BBB 345
CCC 333 CCC 333
DDD 444
最好我想在 perl 中执行此操作,但不确定如何。任何帮助将不胜感激。
如果它只是关于制作数据结构,这可能很容易。
#!/usr/bin/env perl
# usage: script.pl file1 file2 ...
use strict;
use warnings;
my %data;
while (<>) {
chomp;
my ($key, $value) = split;
push @{$data{$key}}, $value;
}
use Data::Dumper;
print Dumper \%data;
然后,您可以以您喜欢的任何格式输出。如果它真的是关于完全按原样使用文件,那么它有点棘手。
正如 ikegami 所提到的,它假定文件的内容按您的示例所示排列。
use strict;
use warnings;
open my $file1, '<file1.txt' or die $!;
open my $file2, '<file2.txt' or die $!;
my $file1_line = <$file1>;
print $file1_line;
while ( my $file2_line = <$file2> ) {
if( defined( $file1_line = <$file1> ) ) {
chomp $file1_line;
print $file1_line;
}
my $tabs = $file1_line ? "\t" : "\t\t";
print "$tabs$file2_line";
}
close $file1;
close $file2;
查看您的示例,您在两个文件中显示了一些相同的键/值对。鉴于此,您似乎想要显示文件 1 独有的对、文件 2 独有的对,并显示常见的对。如果是这种情况(并且您没有尝试通过键或值匹配文件对),您可以use List::Compare:
use strict;
use warnings;
use List::Compare;
open my $file1, '<file1.txt' or die $!;
my @file1 = <$file1>;
close $file1;
open my $file2, '<file2.txt' or die $!;
my @file2 = <$file2>;
close $file2;
my $lc = List::Compare->new(\@file1, \@file2);
my @file1Only = $lc->get_Lonly; # L(eft array)only
for(@file1Only) { print }
my @bothFiles = $lc->get_intersection;
for(@bothFiles) { chomp; print "$_\t$_\n" }
my @file2Only = $lc->get_Ronly; # R(ight array)only
for(@file2Only) { print "\t\t$_" }
假设文件已排序,
sub get {
my ($fh) = @_;
my $line = <$fh>;
return () if !defined($line);
return split(' ', $line);
}
my ($key1, $val1) = get($fh1);
my ($key2, $val2) = get($fh2);
while (defined($key1) && defined($key2)) {
if ($key1 lt $key2) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
elsif ($key1 gt $key2) {
print(join("\t", '', '', $key2, $val2), "\n");
($key2, $val2) = get($fh2);
}
else {
print(join("\t", $key1, $val1, $key2, $val2), "\n");
($key1, $val1) = get($fh1);
($key2, $val2) = get($fh2);
}
}
while (defined($key1)) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
while (defined($key2)) {
print(join("\t", '', '', $key1, $val1), "\n");
($key2, $val2) = get($fh2);
}
类似于 Joel Berger 的回答,但这种方法允许您跟踪文件是否包含给定的密钥:
my %data;
while (my $line = <>){
chomp $line;
my ($k) = $line =~ /^(\S+)/;
$data{$k}{line} = $line;
$data{$k}{$ARGV} = 1;
}
use Data::Dumper;
print Dumper(\%data);
输出:
$VAR1 = {
'CCC' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'CCC 333'
},
'BBB' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'BBB 345'
},
'DDD' => {
'other.dat' => 1,
'line' => 'DDD 444'
},
'AAA' => {
'data.dat' => 1,
'line' => 'AAA 123'
}
};