我之前对diff
输出进行了一些更简单的分析,所以我有一个 Perl 脚本,它为我提供了一个开始的基础。考虑以下两个数据文件,file.1
和file.2
.
文件.1
Data
Foo
Bar 1
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
-(.*)\n\+\1\n
The problems start when I have multi-line matches like:
Foo 2
Bar 2
Etc.
文件.2
Data
Foo
Bar 10
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
-(.*)\n\+\1\n
The problems start when I have multi-line matches like:
Foo 20
Bar 20
Etc.
原始差异输出
原始统一diff
输出为:
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar 1
+Bar 10
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo 2
-Bar 2
+Foo 20
+Bar 20
Etc.
后处理输出
现在,经过后期处理,所有的数字字符串都被替换为##
,所以后期处理的文件如下所示:
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar ##
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo ##
-Bar ##
+Foo ##
+Bar ##
Etc.
这是程序的输入,它将分析差异是否仍然真实。
为了真正有用,我们必须隔离标题行(---
和+++
)并保留它们。对于开始的每个差异块@@
,我们需要捕获 和 的相邻行,-
并且+
:
- 检查和的行数是否
+
相同-
- 检查行的内容是否与
-
行的内容相同+
。
- 请记住,尽管示例数据没有显示它,但您可以在一个部分中包含多个块
-
和行。+
@@
- 如果块中没有任何差异
@@
,则可以丢弃整个块。
- 如果存在差异,那么如果之前没有输出标题行,我们需要输出它们。
- 如果存在差异,则输出整个差异块。
冲洗并重复。
我为此选择的编程语言是 Perl。
checkdiffs.pl
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $file1;
my $file2;
my $header = 0;
OUTER:
while (my $line = <>)
{
chomp $line;
print "[$line]\n" if debug;
if ($line =~ m/^--- /)
{
$file1 = $line;
$file2 = <>;
chomp $file2;
print "[$file2]\n" if debug;
if ($file2 !~ m/^\+\+\+ /)
{
print STDERR "Unexpected file identification lines\n";
print STDERR "$file1\n";
print STDERR "$file2\n";
next OUTER;
}
$header = 0; # Have not output file header yet
my @lines;
my $atline;
last OUTER unless defined($line = <>);
INNER:
while ($line =~ m/^@@ /)
{
chomp $line;
print "@[$line]\n" if debug;
$atline = $line;
@lines = ();
while (defined($line = <>) && $line =~ m/^[- +]/)
{
chomp $line;
print ":[$line]\n" if debug;
push @lines, $line;
}
# Got a complete @@ block of diffs
post_process($atline, @lines);
last OUTER if !defined($line);
next INNER if ($line =~ m/^@@ /);
print STDERR "Unexpected input line: [$line]\n";
last OUTER;
}
}
}
sub differences
{
my($pref, $mref) = @_;
my $pnum = scalar(@$pref);
my $mnum = scalar(@$mref);
print "-->> differences\n" if debug;
return 0 if ($pnum == 0 && $mnum == 0);
return 1 if ($pnum != $mnum);
foreach my $i (0..($pnum-1))
{
my $pline = substr(${$pref}[$i], 1);
my $mline = substr(${$mref}[$i], 1);
return 1 if ($pline ne $mline);
}
print "<<-- differences\n" if debug;
return 0;
}
sub post_process
{
my($atline, @lines) = @_;
print "-->> post_process\n" if debug;
# Work out whether there are any differences left
my @plines = (); # +lines
my @mlines = (); # -lines
my $diffs = 0;
my $ptype = ' '; # Previous line type
foreach my $line (@lines)
{
print "---- $line\n" if debug;
my ($ctype) = ($line =~ m/^(.)/);
if ($ctype eq ' ')
{
if (($ptype eq '-' || $ptype eq '+') && differences(\@plines, \@mlines))
{
$diffs = 1;
last;
}
@plines = ();
@mlines = ();
}
elsif ($ctype eq '-')
{
push @mlines, $line;
}
elsif ($ctype eq '+')
{
push @plines, $line;
}
else
{
print STDERR "Unexpected input line format: $line\n";
exit 1;
}
$ptype = $ctype;
}
$diffs = 1 if differences(\@plines, \@mlines);
if ($diffs != 0)
{
# Print the block of differences, preceded by file header if necessary
if ($header == 0)
{
print "$file1\n";
print "$file2\n";
$header = 1;
}
print "$atline\n";
foreach my $line (@lines)
{
print "$line\n";
}
}
print "<<-- post_process\n" if debug;
return;
}
使用文件进行了测试data
,并使用了三个变体:
$ perl checkdiffs.pl data
$ perl checkdiffs.pl data.0
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar #0
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
$ perl checkdiffs.pl data.1
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo #0
-Bar ##
+Foo ##
+Bar ##
Etc.
$ perl checkdiffs.pl data.2
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar #0
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo ##
-Bar #0
+Foo ##
+Bar ##
Etc.
$
这符合你的要求吗?