我希望用逗号分割这样的字符串:
field1:"value1", field2:"value2", field3:"value3,value4"
变成一个string[]
看起来像:
0 field1:"value1"
1 field2:"value2"
2 field3:"value3,value4"
我正在尝试这样做,Regex.Split
但似乎无法计算出正则表达式。
例如,这样做会Matches
比使用容易得多Split
string[] asYouWanted = Regex.Matches(input, @"[A-Za-z0-9]+:"".*?""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
尽管如果您的值(或字段!)有可能包含转义引号(或任何类似棘手的东西),那么使用适当的 CSV 解析器可能会更好。
如果您确实在值中转义了引号,我认为以下正则表达式有效- 给它一个测试:
@"field3:""value3\\"",value4""", @"[A-Za-z0-9]+:"".*?(?<=(?<!\\)(\\\\)*)"""
添加(?<=(?<!\\)(\\\\)*)
应该确保"
它停止匹配之前只有偶数个斜线,因为奇数个斜线意味着它被转义。
未经测试,但这应该没问题:
string[] parts = string.Split(new string[] { ",\"" }, StringSplitOptions.None);
如果需要,请记住在末尾添加 " 。
string[] arr = str.Split(new string[] {"\","}}, StringSplitOptions.None).Select(str => str + "\"").ToArray();
按照 webnoob 提到的方式拆分\,
,然后使用选择在尾随后缀"
,然后转换为数组。
试试这个
// (\w.+?):"(\w.+?)"
//
// Match the regular expression below and capture its match into backreference number 1 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the characters “:"” literally «:"»
// Match the regular expression below and capture its match into backreference number 2 «(\w.+?)»
// Match a single character that is a “word character” (letters, digits, and underscores) «\w»
// Match any single character that is not a line break character «.+?»
// Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
// Match the character “"” literally «"»
try {
Regex regObj = new Regex(@"(\w.+?):""(\w.+?)""");
Match matchResults = regObj.Match(sourceString);
string[] arr = new string[match.Captures.Count];
int i = 0;
while (matchResults.Success) {
arr[i] = matchResults.Value;
matchResults = matchResults.NextMatch();
i++;
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
最简单的内置方式在这里。我查过了。它工作正常。它"Hai,\"Hello,World\""
分为{"Hai","Hello,World"}