我正在处理两个大型数据集(300 x 500,000),我有一个矩阵,两个数据中都有 0,1,2 和 NA 值,我想比较这些文件并计算匹配的数字这两个文件的每一行并将结果插入到输出表结果中。
File 1
2 1 0
0 1 1
1 0 NA
File 2
2 1 0
Na 1 1
1 NA 0
如何比较每行中匹配值的计数和总和?
我已经解释了您所说的“总计”的含义,并且匹配行的数量只是被转储了,但这可以满足您的要求,并且您应该能够将其应用于您的确切规格
#!/usr/bin/perl
#
use Data::Dumper;
use strict;
use warnings;
# open files with error checking
open(my $f1,"file1") || die "$! file1";
open(my $f2,"file2") || die "$! file2";
#hash to store count of similar rows in
my %match_count=();
#total sum
my $total=0;
#read line from each file, lower case it to ignore Na NA difference and
#chomp to remove \n so this isn't stored
while(my $l1=lc(<$f1>)) {
my $l2 = lc(<$f2>);
chomp($l1);
chomp($l2);
#see if lines are the same
if ($l1 eq $l2) {
#increment counter for this line
$match_count{$l1}++;
#find sum of row and add to total
my ($first,$second,$third) = split(/\s/,$l1);
$total += $first+$second+$third;
}
}
print "sum total of matches = $total\n";
print Dumper(\%match_count);