regex - 拆分数组中的特定字符串？

Question

我有一个带有字符串的数组（@myarray），如下所示：

rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085

当我使用以下代码匹配另一个没有带分号和多个 rsID 字符串的类似 (@otherarray) 数组时：

for $1(0 .. $#otherarray) {
    for $m(1 .. $#myarray) {
        if ($myarray[$m] =~ /$otherarray[$i]/i) { 
            $IDmatch = 1;
        }
    }
}

该脚本不匹配带有分号的字符串中的任何 ID。我尝试像这样拆分分号字符串：

foreach $string (@myarray) {
    if ($string =~ m/;/) {
        push (@newarray, $string);
    }
}

返回数组@new：

rs24567;rs324987;rs234985
rs32456;rs2349085

然后我尝试用一个常见的字符来分割它：

foreach $line (@new) {
    $line =~ tr/;//d;
    $line =~ s/rs/ rs/g;
    $line = split (/ /);
}

但是当我打印 @new 数组时，它只返回零。我知道这一定与我的循环有关，因为我在使用 perl 中的循环时遇到了麻烦。如果您有任何想法，请告诉我！谢谢！

score 2 · Accepted Answer

你没有说你想用这两个数组做什么，但如果我理解你的问题，听起来你可能想找到出现在两个列表中的所有那些 rsID。

该程序通过将第一个数组（请使用比myarrayand更好的名称otherarray）转换为以所有 ID 作为键的散列来工作。然后它用于grep查找第二个数组中出现在散列中的所有那些，将它们推送到 array @dups。

use strict;
use warnings;

my @myarray = qw(
  rs30000489
  rs903484
  rs24567;rs324987;rs234985
  rs5905002
  rs32456;rs2349085
);

my @otherarray = qw(
  rs3249487
  rs30000489
  rs325987
  rs324987
  rs234967
  rs32456
  rs234567
);

my %rsids = map { $_ => 1 } map { split /;/ } @myarray;

my @dups = grep $rsids{$_}, @otherarray;

print "$_\n" for @dups;

输出

rs30000489
rs324987
rs32456

score 0 · Accepted Answer

如果您正在寻找独特的物品，您首先应该想到的是哈希。尝试：

#!/usr/bin/env perl

use strict;
use warnings;

# --------------------------------------

use charnames qw( :full :short   );
use English   qw( -no_match_vars );  # Avoids regex performance penalty

use Data::Dumper;

# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent   = 1;

# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;

# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};

# --------------------------------------
#       Name: unique
#      Usage: %hash = unique( @array );
#    Purpose: Create a hash of unique keys from array items.
# Parameters: @array -- May have multiple entries separated by a semi-colon
#    Returns:  %hash -- Unique keys of array items
#
sub unique {
  my @array = @_;
  my %hash  = ();

  for my $item ( @array ){
    my @items = split m{ \; }msx, $item;
    $hash{$_} ++ for @items;
  }

  return %hash;
}

# --------------------------------------

my @myarray = qw(
  rs30000489
  rs903484
  rs24567;rs324987;rs234985
  rs5905002
  rs32456;rs2349085
);

my @otherarray = qw(
  rs3249487
  rs30000489
  rs325987
  rs324987
  rs234967
  rs32456
  rs234567
);

my %my_hash = unique( @myarray );
print Dumper \%my_hash if DEBUG;

my %other_hash = unique( @otherarray );
print Dumper \%other_hash if DEBUG;

my %intersection = ();
for my $item ( keys %my_hash ){
  if( exists $other_hash{$item} ){
    $intersection{$item} ++;
  }
}
print Dumper \%intersection if DEBUG;

score 0 · Accepted Answer

只是关于 Perl 中的循环的一些事情。您以 2 种不同的方式编写了for循环。

for $1(0 .. $#otherarray) { ... }

和

foreach $line (@new) { ... }

您可以编写与第二个完全相同的第一个循环。

foreach $1 ( 0..$#otherarray) { .. }

或者更好

foreach my $other_array_content ( @otherarray) { .. }

您使用的$1是一个特殊的字符（在正则表达式中使用）。

然后你也可以使用split内部foreach循环。

foreach my $data (@data) {
  foreach (split /;/,$data) {

  }
}

简而言之，这是您问题的简短解决方案：

my @checked = qw(rs30000489
  rs9033484
  rs2349285
  rs5905402
  rs32456
);

my $idMatch = 0;
my @data    = <DATA>;

foreach my $data ( @data ) {
  foreach my $checked ( @checked ) {
    if ( $data =~ m/;/ ) {
      foreach my $data2 ( split /;/, $data ) {
        if ( $checked eq $data2 ) {
          $idMatch = 1;
        }
      }
    } else {
      if ($data eq $checked) {
        $idMatch = 1;
      }
    }
  }
}

print $idMatch;

__DATA__

rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085

regex - 拆分数组中的特定字符串？

3 回答 3

Related

Reference