当我一眼看不到正则表达式的作用时,我会将其分成几行,以便更容易弄清楚发生了什么。不匹配的括号更明显,您甚至可以为其添加注释。另外,让我们在它周围添加 Java 代码,这样可以清楚地避免奇怪的东西。
String regex = "^" +
"(\"[^,\"]*\")" +
"(," +
"(\"[^,\"]*\")" +
")*" +
"(." +
"(\"[^,\"]*\")" +
"(," +
"(\"[^,\"]*\")" +
")" +
")*" +
final String QUOTED_VALUE = "\"[^\"]*\""; // A double quote character, zero or more non-double quote characters, and another double quote
String regex = "^" + // The beginning of the string
"(" + QUOTED_VALUE + ")" + // Capture the first value
"(," + // Start a group, a comma
"(" + QUOTED_VALUE + ")" + // Capture the next value
")*" + // Close the group. Allow zero or more of these
"(." + // Start a group, any character
"(" + QUOTED_VALUE + ")" + // Capture another value
"(," + // Started a nested group, a comma
"(" + QUOTED_VALUE + ")" + // Capture the next value
")" + // Close the nested group
")*" + // Close the group. Allow zero or more
".$"; // Any character, the end of the input
final String QUOTED_VALUE = "\"[^\"]*\""; // A double quote character, zero or more non-double quote characters, and another double quote
final String NEWLINE = "(\n|\n\r|\r\n)"; // A newline for (almost) any OS: Windows, *NIX or Mac
String regex = "^" + // The beginning of the string
"(" + QUOTED_VALUE + ")" + // Capture the first value
"(?:," + // Start a group, a comma
"(" + QUOTED_VALUE + ")" + // Capture the next value
")*" + // Close the group. Allow zero or more of these
"(?:" + NEWLINE + // Start a group, any character
"(" + QUOTED_VALUE + ")" + // Capture another value
"(?:," + // Started a nested group, a comma
"(" + QUOTED_VALUE + ")" + // Capture the next value
")" + // Close the nested group
")*" + // Close the group. Allow zero or more
NEWLINE + "$"; // A trailing newline, the end of the input
从这里,我看到你再次重复工作。让我们解决这个问题。这也修复了原始正则表达式中缺少的 *。看看你能不能找到它。
final String QUOTED_VALUE = "\"[^\"]*\""; // A double quote character, zero or more non-double quote characters, and another double quote
final String NEWLINE = "(\n|\n\r|\r\n)"; // A newline for (almost) any OS: Windows, *NIX or Mac
final String LINE = "(" + QUOTED_VALUE + ")" + // Capture the first value
"(?:," + // Start a group, a comma
"(" + QUOTED_VALUE + ")" + // Capture the next value
")*"; // Close the group. Allow zero or more of these
String regex = "^" + // The beginning of the string
LINE + // Read the first line, capture its values
"(?:" + NEWLINE + // Start a group for the remaining lines
LINE + // Read more lines, capture their values
")*" + // Close the group. Allow zero or more
NEWLINE + "$"; // A trailing newline, the end of the input
1)我之前说过,换行会更容易中断。一个原因是:你如何确定每行有多少个值?硬编码它会起作用,但一旦你的输入发生变化,它就会中断。也许这对您来说不是问题,但这仍然是不好的做法。另一个原因:正则表达式仍然太复杂,我不喜欢。你真的可以在 LINE 停下来。
2) CSV 文件允许这样的行:
"some text","123",456,"some more text"