0

我有一个列文件如下

np  np  n_nom   3   {RP}    {RP}

paNiyappeVttirunna  VM_RP   V_RP    o   o   o

np  np  n_nom   -3  {/RP}   {/RP}

接下来的几行是......

np   np n_nom   3   {RP}    {RP}


paNiya      VM_RP     V_RP    o  o   o

np  np  n_nom   -3  {/RP}   

文件就这样继续下去。

我想计算同时出现 {RP} {RP} 和 {/RP} {/RP} 的文件部分的数量。

4

1 回答 1

1

This is done very simply using backreferences in the regular expression

The program below searches for any occurrence of {RP} or {/RP} followed by some whitespace and the same string again

It expects the data file as a command line parameter

use strict;
use warnings;

my $count;

while (<>) {
  $count++ if m|(\{/?RP\})\s+\1|;
}

print "$count occurrences";

output

3 occurrences

Update

Your description of the problem is very unclear but I have done my best at reinterpreting it. This code looks for all cases where a line containing {/RP} <some whitespace> {/RP} is followed immediately by a line containing {RP} <some whitespace> {RP}. All blank input lines are ignored

use strict;
use warnings;

my @pair;
my $count;

while (<>) {
  next unless /\S/;
  push @pair, $_;
  next unless @pair >= 2;
  shift @pair while @pair > 2;
  if ($pair[0] =~ m|\{/RP\}\s+\{/RP\}| and $pair[1] =~ m|\{RP\}\s+\{RP\}|) {
    $count++;
    @pair = ();
  }
}

print "$count occurrences\n";

output

1 occurrences

Update

OK lets try again. This program checks the third and fourth whitespace-separated columns of every line. Whenever it sees a pair of {RP} it sets $depth to 1, and Whenever it sees a pair of {/RP} it sets $depth it to zero, incrementing $count if $depth was previously non-zero

Note that all lines containing only a single {RP} or {/RP} are simply ignored. It is impossible to tell from your description what action you want in this circumstance

use strict;
use warnings;

my $depth;
my $count = 0;

while (<>) {
  my @fields = map $_ // '', (split)[4,5];
  if (grep($_ eq '{RP}', @fields) == 2) {
    $depth = 1;
  }
  elsif (grep($_ eq '{/RP}', @fields) == 2) {
    $count++ if $depth;
    $depth = 0;
  }
}

print "$count occurrences\n";

output

1 occurrences
于 2012-08-31T11:14:02.317 回答