perl - 从字符串中解析出文本

Question

我有一个制表符分隔的文件 1：

  20    50  80  110
  520   590 700 770
  410   440 20  50
  300   340 410 440

读取并将它们放入数组中：

while(<INPUT>)
{
    chomp;
    push @inputarray, $_;
}

现在我正在循环另一个file2：

  20, 410, 700
  80, 520
  300

对于file2中每一行的foreach数字，我想在@inputarray中搜索数字。如果存在，我想抓取后面对应的数字。例如，对于数字 20，我想获取数字 50。我假设它们仍然由字符串中的一个制表符分隔，该字符串作为 @inputarray 中的一个数组元素存在。

while(my $line = <INPUT2>) 
{
  chomp $line;
  my @linearray = split("\t", $line);
  foreach my $start (@linearray)
  {
    if (grep ($start, @inputarray))
    {
       #want to grab the corresponding number
    }
  }
}

一旦 grep 找到它，我不知道如何抓取该数组元素以找到数字的位置，以使用 substr 函数提取相应的数字。如何获取 grep 找到的数组元素？

期望的输出是：

line1:
20 50
410 440
700 770

line2:
80 110
520 590

line3:
300 340

score 2 · Accepted Answer

恕我直言，最好将 file1 中的数字存储在哈希中。参考您上面提供的 file1 的示例 clontent，您可以得到类似下面的内容

{
   '20' => '50',
   '80' => '110',
   '520'=> '590',
   '700'=> '770',
   '410'=> '440',
   '20' => '50',
   '300'=> '340',
   '410' => '440'
}

一段示例代码就像

my %inputarray;
while(<INPUT>)
{
    my @numbers = split $_;
    my $length = scalar $numbers;
    # For $i = 0 to $i < $length;
    # $inputarray{$numbers[$i]} = $numbers[$i+1];
    # $i+=2;
}

上述循环的演示

index:    0     1   2    3
numbers: 20    50  80  110

first iteration: $i=0
     $inputarray{$numbers[0]} = $numbers[1];
     $i = 2; #$i += 2;
second iteration: $i=2
     $inputarray{$numbers[2]} = $numbers[3];

然后在解析 file2 时，您只需要将数字视为key.%inputarray

score 1 · Accepted Answer

我相信这会让你接近你想要的。

#!/usr/bin/perl -w

my %follows;

open my $file1, "<", $ARGV[0] or die "could not open $ARGV[0]: $!\n";

while (<$file1>)
{
    chomp;

    my $prev = undef;

    foreach my $curr ( split /\s+/ )
    {
        $follows{$prev} = $curr if ($prev);
        $prev = $curr;
    }
}

close $file1;

open my $file2, "<", $ARGV[1] or die "could not open $ARGV[1]: $!\n";
my $lineno = 1;

while (<$file2>)
{
    chomp;
    print "line $lineno\n";
    $lineno++;

    foreach my $val ( split /,\s+/, $_ )
    {
        print $val, " ", ($follows{$val} // "no match"), "\n";
    }
    print "\n";
}

如果您只想考虑成对file1中的数字，而不是查看哪些数字跟随其他数字而不考虑对边界，那么您需要稍微更改第一个循环中的逻辑。while

#!/usr/bin/perl -w

my %follows;

open my $file1, "<", $ARGV[0] or die "could not open $ARGV[0]: $!\n";

while (<$file1>)
{
    chomp;

    my $line = $_;

    while ( $line =~ s/(\S+)\s+(\S+)\s*// )
    {
        $follows{$1} = $2;
    }
}

close $file1;

open my $file2, "<", $ARGV[1] or die "could not open $ARGV[1]: $!\n";
my $lineno = 1;

while (<$file2>)
{
    chomp;
    print "line $lineno\n";
    $lineno++;

    foreach my $val ( split /,\s+/, $_ )
    {
        print $val, " ", ($follows{$val} // "no match"), "\n";
    }
    print "\n";
}

score 0 · Accepted Answer

如果您想读取一次输入但要经常检查数字，则最好split将输入行转换为单个数字。然后将每个数字作为键添加到哈希中，并将以下数字作为值。exist这使得阅读速度变慢并占用更多内存，但由于哈希的性质，您想要检查以下数字的第二部分将变得轻而易举。

如果我理解你的问题是正确的，你可以只使用一个大哈希。这当然是假设每个数字后面总是跟着相同的数字。

perl - 从字符串中解析出文本

3 回答 3

Related

Reference