2

致力于 XCode 项目文件解析器PBXProject的纯 ruby​​ 实现,并且在正则表达式方面几乎不需要帮助。

所以 PBXProject 文件有一堆奇怪的混合行,它们混合了内容。我现在拥有的是正则表达式,(.*?) = (.*?)( \/\* (.*) \*\/)?; ?它适用于更简单的情况(第一行)。但是对于第二行,它切得太早了(对第一个 ; -字符)。

isa = PBXBuildFile; fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */;

isa = PBXBuildFile; fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */; settings = {COMPILER_FLAGS = "-fno-objc-arc"; };

所以我想从这些行中得到简单的name = value对,即

isa = PBXBuildFile
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

用一个正则表达式实现这一目标的简单方法?

4

2 回答 2

1

这个正则表达式可以正常工作:

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

请注意,只允许使用一级括号,正则表达式不会处理嵌套括号。

从您的示例中,将捕获以下行:

isa = PBXBuildFile
fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */
isa = PBXBuildFile
fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

这是正则表达式的解释:

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

Options: ^ and $ match at line breaks

Match a single character present in the list below «[a-zA-Z0-9]*»
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    A character in the range between “a” and “z” «a-z»
    A character in the range between “A” and “Z” «A-Z»
    A character in the range between “0” and “9” «0-9»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match any single character that is not a line break character «.*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below «(?:(?={){[^}]*}|(?=;))»
    Match either the regular expression below (attempting the next alternative only if this one fails) «(?={){[^}]*}»
        Match the character “{” literally «{»
        Match any character that is NOT a “}” «[^}]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
        Match the character “}” literally «}»
    Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?=;)»
        Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=;)»
            Match the character “;” literally «;»
于 2012-09-07T11:33:25.800 回答
0

根据您希望解析的内容的确切性质,可能无法使用单个有限表达式。您遇到问题的第二行表明可能涉及嵌套模式。嵌套模式只能匹配到有限的深度,这是不建议使用正则表达式解析 [X]HTML 的原因之一。如果你真的想处理任意深度的嵌套,你可能想研究类似Treetop的东西。

如果你不需要它是健壮的,你可以尝试这样的表达式:

/((?i)(?:[^;]+=\s*\{.*?\})|[^;]+=[^;]+);/

它将首先尝试匹配某种形式的东西something = {anything},如果不成功,它将something = something在 a 之前匹配;。您应该能够使用string.scan(/regex/)来查找给定字符串的所有匹配项。以这种方式处理块应避免过早结束匹配过程等问题,并且您可以轻松提取对。

进一步阅读:

于 2012-09-07T10:49:08.270 回答