这是一个利用两个事实的肮脏把戏:
- normal text strings never contain null bytes (if you don't know what a null byte is, you should as a programmer: http://en.wikipedia.org/wiki/Null_character, and nb. it is not the same thing as the number 0 or the character 0).
- perl strings can contain null bytes if you put them there, but be careful, as this may screw up some perl internal functions.
The "be careful" is just a point to be aware of. Anyway, the idea is to substitute a null byte at the point where you don't want breaks:
my $s = "ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD";
my @nobreak = (4,9);
foreach (@nobreak) {
substr($s, $_, 1) = "\0";
}
"\0"
is an escape sequence representing a null byte like "\t"
is a tab. Again: it is not the character 0. I used 4 and 9 because there were E's in those positions. If you print the string now it looks like:
ABCDABCDABCDEPABCDEABCDEPABCDEABCD
因为空字节不显示,但它们在那里,我们稍后将它们交换回来。首先拆分:
my @a = split(/E(?!P)/, $s);
然后将零字节交换回来:
$_ =~ s/\0/E/g foreach (@a);
如果你@a
现在打印,你会得到:
ABCDEABCDEABCDEPABCD
ABCDEPABCD
ABCD
这正是你想要的。请注意, split 删除了分隔符(在本例中为 E);如果您打算保留那些,您可以在之后再次将它们重新添加。如果分隔符来自更动态的正则表达式,则稍微复杂一些,请参见此处:
http://perlmeme.org/howtos/perlfunc/split_function.html
“示例 9. 保留分隔符”
如果这些@nobreak
位置有可能不是 E,那么在交换它们时还必须跟踪这些位置,以确保再次替换为正确的字符。