3

我有一个文件需要浏览,其中散布着文字字符串。有些包裹着特定的宏,有些则没有。一行中可能有多个文字字符串。如何编写一个正则表达式,将一个宏放在那些周围没有特定宏之一的周围?无法包装的宏集超过 1 个,但数量有限(比如 3 个)。

因此,如果宏集是 FOO、BAR 和 BAZ,并且我想用 BAFF 包装所有其他未包装的文字字符串,我将拥有:

JBAZ ( "str \" " ) BAZ( " \" boo" ) BAR ("boo") hello(" jazz ") FOO("FUN")

会导致:

JBAZ (BAFF("str \" ")) BAZ( " \" boo" ) BAR ("boo") hello(BAFF(" jazz ")) FOO("FUN")

我什至不确定它是否可以在一个正则表达式中完成,但对于那些这样做的人来说是额外的。;)

编辑好的,这是我做过的一次尝试:

my $qs = q("(?:\\\\.|[^"])*")
# Read in characters until it hits a double quote and then check if string before
# it is not \bFOO, \bBAR or \bBAZ.  Then read in quoted string and put BAFF() 
# around it.
s/([^"]*)(?<!\bFOO)(?<!\bBAR)(?<!\bBAZ)[[:space:]](?<!\))*\($qs\))/$1BAFF($2)/g

# Doesn't work since it'll find an end quote or a quoted quote and match replace
# from there:
# JBAZ ( BAFF("str \" ") ) BAZ( BAFF(" \" boo") ) BAR ("booBAFF(") hello(") jazz BAFF(") FOO(")FUN")
4

1 回答 1

1

你可以使用这个:

my $string = 'JBAZ ( "str \" " ) BAZ( " \" boo" ) BAR ("boo") hello(" jazz ") FOO("FUN")';
$string =~ s/\b(?>FOO|BAR|BAZ)\s*+\(\s*+"(?>[^"\\]++|\\{2}|\\(?s).)*+"\s*+\)(*SKIP)(?!)|"(?>[^"\\]++|\\{2}|\\(?s).)*+"/BAFF($&)/g
print $string;

图案细节:

该模式有两部分,第一部分将匹配所有 FOO BAR BAZ 内容并强制模式失败,第二部分匹配双引号内的其他内容。

第一部分:

\b(?>FOO|BAR|BAZ)  # FOO, BAR or BAZ
\s*+\(\s*+"        # opening parenthesis and double quote
(?>                # atomic group that describe allowed content inside quotes
    [^"\\]++       # all chars that are not a quote or a backslash
  |                # OR
    \\{2}          # an even number of quotes
  |                # OR
    \\(?s).        # all escaped characters (thus \" is allowed)
)*+                # repeat the group zero or more times
"\s*+\)            # the closing quote and closing parenthesis
(*SKIP)            # define a point in the pattern where the regex engine is not
                   # allowed to backtrack if the pattern will fail later.
(?!)               # make the pattern fail (not followed by nothing)
                   # (You can use (*FAIL) instead of)

第二部分很简单,使用与第一部分相同的双引号内容描述。

"(?>[^"\\]++|\\{2}|\\(?s).)*+"

注意:关于第二部分,由于模式开始有点长,使用(?(DEFINE)...)语法和\x修饰符使其更具可读性并避免重复此子模式会很有趣:

my $pattern = qr/
(?(DEFINE) (?<quoted> " (?> [^"\\]++ | \\{2} | \\. )*+ " ) )
    \b (?> FOO | BAR | BAZ )
    \s*+ \( \s*+ (?&quoted) \s*+ \)
    (*SKIP) (*FAIL)
    |
    (?&quoted) /xs;

$string =~ s/$pattern/BAFF($&)/g;
于 2013-11-15T01:26:54.097 回答