java - 匹配 INI 部分块

Question

我正在使用正则表达式来尝试匹配 INI 文件中的节块。我正在使用《Regular Expressions Cookbook 》一书中给出的食谱，但它似乎对我不起作用。

这是我正在使用的代码：

final BufferedReader in = new BufferedReader(
    new FileReader(file));
String s;
String s2 = "";
while((s = in.readLine())!= null)
    s2 += s + System.getProperty("line.separator");
in.close();

final String regex = "^\\[[^\\]\r\n]+](?:\r?\n(?:[^\r\n].*)?)*";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String sectionBlock = null;
final Matcher regexMatcher = pattern.matcher(s2);
if (regexMatcher.find()) {
    sectionBlock = regexMatcher.group();
}

这是我的输入文件的内容：

[Section 2]
Key 2.0=Value 2.0
Key 2.2=Value 2.2
Key 2.1=Value 2.1

[Section 1]
Key 1.1=Value 1.1
Key 1.0=Value 1.0
Key 1.2=Value 1.2

[Section 0]
Key 0.1=Value 0.1
Key 0.2=Value 0.2
Key 0.0=Value 0.0

问题是sectionBlock最终等于文件的全部内容，而不仅仅是第一部分。

（我不知道这是否重要，但我在 Windows 上执行此操作，并且其中的行分隔符s2等于“\r\n”（至少，IDEA 调试器将它们显示为）。）

我在这里做错了什么？

score 5 · Accepted Answer

试试这个正则表达式：

(?ms)^\[[^]\r\n]+](?:(?!^\[[^]\r\n]+]).)*

或 Java 字符串文字正则表达式：

"(?ms)^\\[[^]\r\n]+](?:(?!^\\[[^]\r\n]+]).)*"

一个（简短的）解释：

(?ms)          // enable multi-line and dot-all matching
^              // the start of a line
\[             // match a '['
[^]\r\n]+      // match any character except '[', '\r' and '\n', one or more times
]              // match a ']'
(?:            // open non-capturing group 1
  (?!          //   start negative look-ahead
    ^          //     the start of a line
    \[         //     match a '['
    [^]\r\n]+  //     match any character except '[', '\r' and '\n', one or more times
    ]          //     match a ']'
  )            //   stop negative look-ahead
  .            //   any character (including line terminators)
)*             // close non-capturing group 1 and match it zero or more times

用简单的英语它会读作：

匹配一个 '[' 后跟一个或多个字符，除了 '['、'\r' 和 '\n'，然后是一个 ']'（我们称之为匹配 X）。然后对于文本中的每一个空字符串，先往前看，看是否没有看到匹配 X，如果没有，则匹配任意字符。

score 0 · Accepted Answer

*您使用匹配最长可能字符串的贪婪量词。使用不情愿的量词*?来获得最短的匹配。

java - 匹配 INI 部分块

2 回答 2

Related

Reference