perl - 在 perl 中使用替换运算符跳过字符串中的特定位置

Question

昨天，我陷入了 perl 脚本。让我简化一下，假设有一个字符串（比如 ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD），首先我必须在“E”出现的每个位置断开它，其次，在用户想要的位置专门断开它。但是，条件是，程序不应该在 E 后面跟着 P 的位置进行切割。例如，这个序列中有 6 个 E，所以应该得到 7 个片段，但是由于 2 个 E 后面跟着 P，所以只能得到 5 个片段输出中的片段。

我需要关于第二种情况的帮助。假设用户不想在序列中 E 的第 5 和第 10 位剪切这个序列，那么对应的脚本应该是什么让程序只跳过这两个站点？我的第一种情况的脚本是：

my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';

$otext=~ s/([E])/$1=/g; #Main cut rule.

$otext=~ s/=P/P/g;

@output = split( /\=/, $otext);

print "@output";

请帮忙！

score 4 · Accepted Answer

要在“E”上拆分，除了后面跟着“P”的地方，你应该使用否定的前瞻断言。

从perldoc perlre“环顾断言”部分：

(?!pattern)
一个零宽度的负前瞻断言。
例如/foo(?!bar)/匹配任何出现的“foo”，但后面没有“bar”。

my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD'; 
#                E    E    EP    E    EP    E
my @output=split(/E(?!P)/, $otext); 
use Data::Dumper; print Data::Dumper->Dump([\@output]);"

$VAR1 = [
          'ABCD',
          'ABCD',
          'ABCDEPABCD',
          'ABCDEPABCD',
          'ABCD'
        ];

现在，为了不削减出现 #2 和 #4，您可以做 2 件事：

编造一个非常奇特的正则表达式，在给定的情况下自动无法匹配。为了完整起见，我将把它留给其他人尝试回答。

只需将正确的片段缝合在一起。

我太脑残了，无法想出一个好的惯用方法，但是简单而肮脏的方法是：

  my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
  my @output_final;
  for(my $i=0; $i < @output; $i++) {
      if ($no_cuts{$i}) {
          $output_final[-1] .= $output[$i];
      } else {
          push @output_final, $output[$i];
      } 
  }
  print Data::Dumper->Dump([\@output_final];

  $VAR1 = [
            'ABCD',
            'ABCDABCDEPABCD',
            'ABCDEPABCDABCD'
          ];

或者，更简单：

  my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
  for(my $i=0; $i < @output; $i++) {
      $output[$i-1] .= $output[$i]; 
      $output[$i]=undef; # Make the slot empty
  }
  my @output_final = grep {$_} @output; # Skip empty slots
  print Data::Dumper->Dump([\@output_final];

  $VAR1 = [
            'ABCD',
            'ABCDABCDEPABCD',
            'ABCDEPABCDABCD'
          ];

score 0 · Accepted Answer

这是一个利用两个事实的肮脏把戏：

normal text strings never contain null bytes (if you don't know what a null byte is, you should as a programmer: http://en.wikipedia.org/wiki/Null_character, and nb. it is not the same thing as the number 0 or the character 0).
perl strings can contain null bytes if you put them there, but be careful, as this may screw up some perl internal functions.

The "be careful" is just a point to be aware of. Anyway, the idea is to substitute a null byte at the point where you don't want breaks:

my $s = "ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD";

my @nobreak = (4,9);

foreach (@nobreak) {
    substr($s, $_, 1) = "\0";
}

"\0" is an escape sequence representing a null byte like "\t" is a tab. Again: it is not the character 0. I used 4 and 9 because there were E's in those positions. If you print the string now it looks like:

ABCDABCDABCDEPABCDEABCDEPABCDEABCD

因为空字节不显示，但它们在那里，我们稍后将它们交换回来。首先拆分：

my @a = split(/E(?!P)/, $s);

然后将零字节交换回来：

$_ =~ s/\0/E/g foreach (@a);

如果你@a现在打印，你会得到：

ABCDEABCDEABCDEPABCD
ABCDEPABCD
ABCD

这正是你想要的。请注意， split 删除了分隔符（在本例中为 E）；如果您打算保留那些，您可以在之后再次将它们重新添加。如果分隔符来自更动态的正则表达式，则稍微复杂一些，请参见此处：

http://perlmeme.org/howtos/perlfunc/split_function.html

“示例 9. 保留分隔符”

如果这些@nobreak位置有可能不是 E，那么在交换它们时还必须跟踪这些位置，以确保再次替换为正确的字符。

perl - 在 perl 中使用替换运算符跳过字符串中的特定位置

2 回答 2

Related

Reference