regex - 从另一个数组中删除特定的数组元素元素

Question

问题 - 我有两个数组，如下所示。

my @arr1 = qw( jon won don pon );
my @arr2 = qw( son kon bon won kon don pon won pon don won);

我需要从@arr2 中删除@arr1 的第一个匹配元素，即在上面的示例中，我需要从@arr2 中删除won。

目前我的逻辑如下。

#!/usr/bin/perl
my @arr1 = qw( jon won don pon );
my @arr2 = qw( son kon bon won kon don pon won pon don won);
my @remove_indices = ();
my $remove_element;
my $first_remove_index;
OUTER_FOR: for my $i (0..@arr2) {
    $outer_element = $arr2[$i];
    foreach my $innr_element ( @arr1 ) {
        if($innr_element eq $outer_element) {
            push(@remove_indices, $i);
            $first_remove_index = $i;
            $remove_element = $innr_element;
            last OUTER_FOR;
        }
    }
}

for my $i ($first_remove_index+1..@arr2) {
    $outer_element = $arr2[$i];
    if($remove_element eq $outer_element) {
        push(@remove_indices, $i);
    }
}

if (@remove_indices > 0) {
        map {splice (@arr2, $_, 1)} reverse(@remove_indices);
                                    }

print "@arr2";

但这似乎是典型的 C/C++ 风格逻辑。我不能使用哈希。有没有 perl 方法可以做到这一点？

score 1 · Accepted Answer

这是另一种方式（假设您要删除所有出现的第一个匹配元素）：

use strict;
use warnings;

my @arr1 = qw(jon won don pon);
my @arr2 = qw(son kon bon won kon don pon won pon don won);

for my $elem (@arr2){
    if(grep { $_ eq $elem } @arr1){
        @arr2 = grep { $_ ne $elem } @arr2;
        last;
    }
}

print "@arr2";

输出：

son kon bon kon don pon pon don

score 1 · Accepted Answer

病毒，当然你可以使用哈希来解决这个问题。

如果在您的示例中扩展一个或两个数组的大小，它实际上会为您提供更好的 CPU 使用率。

use strict;

my @arr1 = qw(jon won don pon);
my @arr2 = qw(son kon bon won kon don pon won pon don won);

my $i;
my %h;
for (@arr2) { push @{$h{$_} }, $i++ }
for my $a (@arr1) {
    if (exists $h{$a}) {
        for (@{$h{$a}}) {
            $arr2[$_] = '';
        }
        last;
    }
}

@arr2 = grep { length } @arr2;
print "@arr2\n";

我对不同提议的解决方案的相对效率感到好奇，并编写了一个测试程序来尝试它们。当您使用测试数据时，您会很高兴知道您的程序与任何程序一样好。但不是当数组开始变大时！

下面的内容有点疯狂，我知道，但不管怎样。但是，如果您打算每天运行您的应用程序无数次，那么您可能会通过对硬件进行一些基准测试而受益。所以请耐心等待:)

以下是 5 个解决方案中每个解决方案的相对 CPU 时间，按上面发布的顺序排列（最经济的显示为“1”）。第一个结果列使用您的示例中使用的数据保存 CPU 时间，接下来的三个列通过在两个数组中添加一个或两个数组，在它们前面加上 50 个元素，这些元素的内容在另一个数组中找不到。

First array                @arr1     @arr1   @arr1+50  @arr1+50
Second array               @arr2   @arr2+50    @arr2   @arr2+50

Your program                 1         7         3        45
Grep approach 1              1         6         3        43
Grep approach 2              3         9        33       160
Convert to string            2         4         3         6
Using hash                   2         6         2         7

在测试数据上运行时，哈希解决方案的 CPU 密集度是您的两倍。但是，如果您将第二个数组扩展 50 个元素，哈希值会好一些，因为您的 CPU 时间现在已增加到 7，而哈希方法已从 2 变为 6。但是如果两个数组都更大，您的程序需要 45 倍以上完成的 CPU 时间比原始数据所需的时间多，而哈希程序只需要 3.5 倍（从 2 到 7）。

显然，所有这些都需要更多的 CPU 时间，因为数组的大小会增长，但比例不同，而且它们的减速也不是线性的。它们可能都可以稍微调整一下，我想结果会在不同的硬件平台上发生变化，但这些应该可以合理地表明它们的相对效率。这是原始数组增长 100 个元素而不是 50 个元素的时间。

First array                @arr1     @arr1  @arr1+100  @arr1+100
Second array               @arr2  @arr2+100    @arr2   @arr2+100

Your program                 1        12         5       173
Grep approach 1              1        12         4       162
Grep approach 2              3        16        68       612
Convert to string            2         7         5        12
Using hash                   2        11         3        12

因此，对于是否应该更喜欢第 4 种方法（将测试数组转换为单个字符串，然后通过它进行正则表达式，一个死在 perl 程序员的解决方案）和第 5 种方法（使用哈希）。稍微地，“转换为字符串”方法更好，这对许多程序员来说是违反直觉的。它也很简短，易于阅读。

底线...如果您将使用示例中的数据集，那么您的代码就可以了（尽管您应该在 INNER_LOOP: 处修复 "(0..@arr2)" 以读取 "(0..$ #arr2)")。

否则，根据您的口味使用第 4 或第 5。我个人会选择不属于我的“转换为字符串”程序，只要我 100% 确定连接字符不会出现在数据中。

我想我现在最好回去做一些富有成效的事情:)

score 0 · Accepted Answer

您可以使用一个简单for的循环，并last在找到匹配项时退出循环。用于grep删除关键字。

use strict;
use warnings;

my @arr1 = qw( jon won don pon );
my @arr2 = qw( son kon bon won kon don pon won pon don won);
my @out;
for my $word (@arr1) {
    my @new = grep !/^\Q$word\E$/, @arr2;
    if (@new != @arr2) {
        print "'$word' found\n";
        @out = @new;
        last;
    }
}
print "@out";

请注意，我\Q ...\E用来禁用可能的正则表达式元字符。比较!=会将数组大小相互比较，当发现差异时，我们知道我们找到了匹配项。

score 0 · Accepted Answer

我相信有很多方法可以做到这一点。这是一个...

my @arr1 = qw( jon won don pon );
my @arr2 = qw( son kon bon won kon don pon won pon don won);

my $s2 = join '|', @arr2;


my $item;
foreach $item (@arr1) {
        last unless $s2 !~ s/$item//g;
}
$s2 =~ s/\|\|/\|/g;
@arr2 = split /\|/, $s2;

print Dumper( @arr2 );

regex - 从另一个数组中删除特定的数组元素元素

4 回答 4

Related

Reference