ruby - PBXProject 文件的正则表达式

Question

致力于 XCode 项目文件解析器PBXProject的纯 ruby 实现，并且在正则表达式方面几乎不需要帮助。

所以 PBXProject 文件有一堆奇怪的混合行，它们混合了内容。我现在拥有的是正则表达式，(.*?) = (.*?)( \/\* (.*) \*\/)?; ?它适用于更简单的情况（第一行）。但是对于第二行，它切得太早了（对第一个 ; -字符）。

isa = PBXBuildFile; fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */;

isa = PBXBuildFile; fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */; settings = {COMPILER_FLAGS = "-fno-objc-arc"; };

所以我想从这些行中得到简单的name = value对，即

isa = PBXBuildFile
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

用一个正则表达式实现这一目标的简单方法？

score 1 · Accepted Answer

这个正则表达式可以正常工作：

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

请注意，只允许使用一级括号，正则表达式不会处理嵌套括号。

从您的示例中，将捕获以下行：

isa = PBXBuildFile
fileRef = C0480C2015F4F91F00E0A2F4 /* zip.c */
isa = PBXBuildFile
fileRef = C0480C2315F4F91F00E0A2F4 /* ZipArchive.mm */
settings = {COMPILER_FLAGS = "-fno-objc-arc"; }

这是正则表达式的解释：

[a-zA-Z0-9]*\s*?=\s*?.*?(?:{[^}]*}|(?=;))

Options: ^ and $ match at line breaks

Match a single character present in the list below «[a-zA-Z0-9]*»
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    A character in the range between “a” and “z” «a-z»
    A character in the range between “A” and “Z” «A-Z»
    A character in the range between “0” and “9” «0-9»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match any single character that is not a line break character «.*?»
    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below «(?:(?={){[^}]*}|(?=;))»
    Match either the regular expression below (attempting the next alternative only if this one fails) «(?={){[^}]*}»
        Match the character “{” literally «{»
        Match any character that is NOT a “}” «[^}]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
        Match the character “}” literally «}»
    Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?=;)»
        Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=;)»
            Match the character “;” literally «;»

score 0 · Accepted Answer

根据您希望解析的内容的确切性质，可能无法使用单个有限表达式。您遇到问题的第二行表明可能涉及嵌套模式。嵌套模式只能匹配到有限的深度，这是不建议使用正则表达式解析 [X]HTML 的原因之一。如果你真的想处理任意深度的嵌套，你可能想研究类似Treetop的东西。

如果你不需要它是健壮的，你可以尝试这样的表达式：

/((?i)(?:[^;]+=\s*\{.*?\})|[^;]+=[^;]+);/

它将首先尝试匹配某种形式的东西something = {anything}，如果不成功，它将something = something在 a 之前匹配;。您应该能够使用string.scan(/regex/)来查找给定字符串的所有匹配项。以这种方式处理块应避免过早结束匹配过程等问题，并且您可以轻松提取对。

进一步阅读：

ruby - PBXProject 文件的正则表达式

2 回答 2

Related

Reference