我有两个文件(文件 A 和文件 B),格式如下。我想匹配两个文件中的某些数据模式并进行匹配。我下面的编码使用了很长时间来生成结果。除此之外,导致提取不完整的地方是错误的。任何替代方法或改进?
我从两个文件中提取了每行名称和分数,并将它们存储在两个输出文件中。每个输出文件都包含提取的名称和分数。首先,如果文件 A 中的分数为负值,请忽略特定的行提取。否则,如果文件 A 中的分数为正值,则将文件 A 的名称与文件 B 匹配。将生成三个条件和三个结果报告(pass.rpt、fail.rpt 和 noCheck.rpt)。
对于那些匹配的名称,它将继续进行比较。如果文件 A 分数 > 50 且文件 B 分数 > 40,则打印匹配的名称、文件 A 中的分数(分数 A)和文件 B 中的分数(分数 B)到 pass.rpt 和 pass_counter($pc),每次比较时加一。否则,如果 <50 和 <40,打印匹配的名称、得分 A 和得分 B 到 fail.rpt 和 fail_counter($fc) 加一。
最后一个条件是针对文件 A 中的那些负值。如果两个文件中的名称匹配,则将 name、scoreA 和 score B 打印到 noCheck.rpt 和 noCheck_counter 加一。
文件 A
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生A-
科学-
数学-
语言。
日期 : Fri Jul 19 17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
命名科学数学语言。分数
Jane_let [0] (sa) 58.78 r 66.15 0.00 -33
Alfert_pipe (sa) 74.72 r 92.72 0.00 82
Olive_pipe [8] (sa) 64.28 f 25.40 0.00 58
质量/excel/i60 86.21 r 59.90 0.00 68
Anne_let (sa) 51.98 f 12.69 0.00 -39
yuki/099/管道 76.52 r 94.32 0.00 -82
弗雷/让/sa/y589 47.79 f 99.00 0.00 78
艾伦/excel/sa/y589 97.00 f 96.00 0.00 -70
......
_
文件 B
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生B-
科学-
数学-
语言。
日期 : Fri Jul 19 17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
命名科学数学语言。分数
Ash_let [9] (sa) 58.78 r 66.15 0.00 33
Alfert_pipe (sa) 74.72 r 92.72 0.00 57
Olive_pipe [8] (sa) 64.28 f 25.40 0.00 20
质量/excel/i60 86.21 r 59.90 0.00 16
Sam_let (sa) 51.98 f 12.69 0.00 -39
yuki/099/管道 76.52 r 94.32 0.00 82
弗雷/让/sa/y589 47.79 f 99.00 0.00 30
艾伦/excel/sa/y589 67.00 f 96.00 0.00 -90
......
_
编码:
use Getopt::Long qw(:config no_ignore_case);
use Data::Dumper;
use POSIX qw(floor);
use strict;
use warning;
my $orig = '';
my $new = '';
GetOptions('orig=s' => \$orig, 'new=s' => \$new);
if (!$orig|!$new) {
print "\n\t Help: test.pl -orig <file A> -new <file B>\n";
exit;
}
open (PASS, ">pass.rpt") || die "ERROR: cannot open";
open (FAIL, ">fail.rpt") || die "ERROR: cannot open";
open (NC, ">noCheck.rpt") || die "ERROR: cannot open";
open (t1, ">t1") || die "ERROR: cannot open";
open (t2, ">t2") || die "ERROR: cannot open";
my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1,$ep2,$s2,$ncc,$pc,$fc);
$ncc = 0;
$pc = 0;
$fc = 0;
fileA_ext();
fileB_ext();
check();
#_______________________________________________________________________________________________
sub fileA_ext() {
if ($orig =~ /\S+\.gz$/) {
open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $orig\n";
} else {
open (FileA,"$orig") || die "ERROR: can't read $orig\n";
}
while (@array = <FileA>) {
foreach $line(@array) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
if ($line !~ m/\((sa)\)/) {
@arr1 = @emp;
next if ($line =~ m/Name/);
$name1 = "$1";
$score1 = "$12";
my $data1 = join(";",$name1,$score1);
push (@arr1, $data1);
}
if ($line =~ m/\((sa)\)/) {
@arr1 = @emp2;
@tmp1 = @emp;
next if ($line =~ m/Name/);
push (@tmp1, $line);
#print t "@tmp1\n";
foreach $line (@tmp1) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
my $name2 = "$1";
substr($name2, -13) = '';
my $score2 = "$12";
my $data1 = join(";",$name2,$score2);
push (@arr1, $data1);
$name2 = $score2 ="";
#print "@arr1\n\n";
}
}
}
print t1 "@arr1\n\n";
}
}
}
close (FileA);
}
#____________________________________________________________________________________________
sub FileB_ext() {
if ($new =~ /\S+\.gz$/) {
open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read $new\n";
} else {
open (FileB,"$new") || die "ERROR: $THIS can't read $new\n";
}
while (@array = <FileB>) {
foreach $line(@array) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
#print "$line\n";
if ($line !~ m/\((sa)\)/) {
@arr2 = @emp;
next if ($line =~ m/Name/);
my $name3 = "$1";
my $score3 = "$12";
my $data2 = join(";",$name3,$score3);
push (@arr2, $data2);
}
if ($line =~ m/\((sa)\)/) {
@arr2 = @emp2;
@tmp2 = @emp;
next if ($line =~ m/Name/);
push (@tmp2, $line);
#print t "@tmp2\n";
foreach $line (@tmp2) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
my $name4 = "$1";
substr($name4, -13) = '';
my $score4 = "$12";
my $data2 = join(";",$name4,$score4);
push (@arr2, $data2);
$name4 = $score4 ="";
#print "@arr2\n\n";
}
}
}
print t2 "@arr2\n\n";
}
}
}
close (FileB);
}
sub check() {
foreach $data1 (@arr1) {
if ($data1 ne ""){
if ($data1 =~ m/(.*)\;(.*)/) {
$ep1 = $emp1;
$s1 = $emp2;
my $ep1 = "$1";
my $s1 = "$2";
#print r "$ep1 $s1\n\n";
foreach $data2 (@arr2) {
if ($data2 ne "") {
if ($data2 =~ m/(.*)\;(.*)/) {
$ep2 = $emp3;
$s2 = $emp4;
my $ep2 = "$1";
my $s2 = "$2";
#print R "$ep2 $s2\n";
if ( $ep1 eq $ep2 && $s1 =~ m/-/g) {
$ncc++;
#print NC "Total match: $ncc\n\n";
print NC "$ep1 $s1 $s2\n";
}
if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $s2 > 40) {
$pc++;
print PASS "$ep1 $s1 $s2\n";
}
if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $s2 < 40) {
$fc++;
print FAIL "$ep1 $s1 $s2\n";
}
}
}
}
}
}
}
print NC "\nTotal match: $ncc\n\n";
print PASS "\nTotal match: $pc\n\n";
print FAIL "\nTotal match: $fc\n\n";
}
预期结果:
通过.rpt
名称 scoreA scoreB
Alfert_pipe (sa) 82 57
失败.rpt
Olive_pipe [8] (sa) 58 20
质量/excel/i60 68 16
弗雷/让/sa/y589 78 30
noCheck.rpt
yuki/099/管道 -82 82
艾伦/excel/sa/y589 -70 -90