我正在尝试比较两个字符串,作为输出,我想要一个连续相同字符的计数,如果字符不同,则只是第二个字符串中的字符。我有一个有效的递归实现,但我不知道如何将连续计数加在一起
代码:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 0;
$Data::Dumper::Terse = 1;
my $str1 = "aaaaaaaaaaaabbbbbbbbbbbccccccccdddddddddddeeeefffffff";
my $str2 = "aaaaaaaaaaaabbbbbbbbbbbccccccccxxxxxxxddxxeeeefffffff";
sub find_diff {
my ( $a, $b ) = @_;
my @rtn = ();
my $len = length $a;
my $div = $len / 2;
if ( $div < 1 ) {
return $b;
}
my $a_1 = substr $a, 0, $div;
my $b_1 = substr $b, 0, $div;
if ($a_1 eq $b_1) {
push @rtn, length $a_1;
}
else {
push @rtn, find_diff( $a_1, $b_1 );
}
my $a_2 = substr $a, $div;
my $b_2 = substr $b, $div;
if ($a_2 eq $b_2) {
push @rtn, length $a_2;
}
else {
push @rtn, find_diff( $a_2, $b_2 );
}
return @rtn;
}
print Data::Dumper::Dumper( [ find_diff('xaabbb', 'aaabbc' ) ] ) . "\n";
print Data::Dumper::Dumper( [ find_diff('aaabbb', 'aaabbc' ) ] ) . "\n";
print Data::Dumper::Dumper( [ find_diff( $str1, $str2 ) ] ) . "\n";
输出:
['a',2,1,1,'c']
[3,1,1,'c']
[26,3,1,1,'x','x','x','x','x','x','x',1,1,'x','x',4,7]
期望的输出:
['a',4,'c']
[5,'c']
[31,'x','x','x','x','x','x','x',2,'x','x',11]
当然,我可以将字符拆分为一个数组,unpack
然后相当容易地计算连续匹配,但我想尝试一种分而治之的方法,以便比较性能。
谢谢!
编辑——在递归情况下,通过返回一个嵌套数组然后减少来解决它。令人惊讶的是,它并没有那么慢:
sub find_diff {
my ( $a, $b ) = @_;
my @rtn = ();
my $len = length $a;
if ( $len < 2 ) {
return [$b, 0];
}
my $div = $len / 2;
my $a_1 = substr $a, 0, $div;
my $b_1 = substr $b, 0, $div;
if ($a_1 eq $b_1) {
push @rtn, [length $a_1, 1];
}
else {
push @rtn, find_diff( $a_1, $b_1 );
}
my $a_2 = substr $a, $div;
my $b_2 = substr $b, $div;
if ($a_2 eq $b_2) {
push @rtn, [length $a_2, 1];
}
else {
push @rtn, find_diff( $a_2, $b_2 );
}
return @rtn;
}
sub compress_string {
my ($a, $b) = @_;
my @list = find_diff($a, $b);
my $acc = 0;
my @result = ();
foreach my $item (@list) {
if ( $item->[1] ) {
$acc += $item->[0];
} else {
push @result, if $acc;
push @result, $item->[0];
$acc = 0;
}
}
push @result, $acc if $acc;
return @result;
}
结果符合我想要的。
更新 - 性能统计
这真的很有趣。使用unpack( 'C*', $string)
速度非常快,我认为这就是为什么我的迭代版本如此之快。递归的速度优势来自更长的字符串(434 个字符)
Rate short_recurse_borodin short_recurse short_array_borodin short_array_sodved short_array
short_recurse_borodin 6944/s -- -31% -36% -73% -84%
short_recurse 10091/s 45% -- -8% -61% -76%
short_array_borodin 10929/s 57% 8% -- -57% -74%
short_array_sodved 25707/s 270% 155% 135% -- -40%
short_array 42553/s 513% 322% 289% 66% --
Rate mid_array_borodin mid_recurse_borodin mid_string mid_array_sodved mid_array
mid_array_borodin 1418/s -- -28% -56% -65% -82%
mid_recurse_borodin 1972/s 39% -- -39% -52% -76%
mid_recurse 3226/s 127% 64% -- -21% -60%
mid_array_sodved 4082/s 188% 107% 27% -- -49%
mid_array 8065/s 469% 309% 150% 98% --
Rate long_array_borodin long_array_sodved long_recurse_borodin long_array long_string
long_array_borodin 172/s -- -67% -80% -85% -89%
long_array_sodved 513/s 199% -- -40% -55% -67%
long_recurse_borodin 854/s 397% 66% -- -25% -45%
long_array 1142/s 564% 122% 34% -- -26%
long_recurse 1546/s 800% 201% 81% 35% --