0

我正在尝试找到一个适用于 CSV 文件(值周围有双引号)的正则表达式,其中值可以包含任何字符。我现在使用的表达式是(在 Java 中,反斜杠被转义):

",(?=(([^\"\\\\]|\\\\.)*\"([^\"\\\\]|\\\\.)*\")*([^\"\\\\]|\\\\.)*$)"

我遇到的问题是“random_value”或“random_value\”等条目。

附加信息:

"000000000000000","","","","email@yahoo.com","random_value""
"000000000000000","","","","email2@yahoo.com","random_value\"
4

2 回答 2

0

使用 JavaCSV

String str = "\"000000000000000\",\"\",\"\",\"\",\"email2@yahoo.com\",\"random_value\\\"\"";
CsvReader reader = CsvReader.parse(str);
reader.readRecord();
for (int i=0; i<reader.getColumnCount(); i++)
    System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));

输出:

Scol[0]: [000000000000000]
Scol[1]: []
Scol[2]: []
Scol[3]: []
Scol[4]: [email2@yahoo.com]
Scol[5]: [random_value\"]
于 2013-07-20T07:09:55.600 回答
0

Description

Well assuming we clean up your source text to include the proper closing quotes, then this expression will:

  • Match all quote comma delimited text
  • Capture the leading comma and quote and closing quote, along with the included text into group 0
  • Trim off the leading and closing quotes and place that value into capture group 1
  • Allow values to contain escaped quote sequences like \" and ""

.

(?:^|,)"((?<=")(?:[^"]*|\\"|"")*?)"(?=[,\r\n]|\Z)

enter image description here

Example

Live Demo: http://www.rubular.com/r/NSSxdHWcDM

Sample Text

"1000000000000000","","","","email1@yahoo.com","1random_value"""
"2000000000000000","","","","email2@yahoo.com","2random_value\""

Capture Groups

[0][0] = "1000000000000000"
[0][1] = 1000000000000000

[1][0] = ,""
[1][1] = 

[2][0] = ,""
[2][1] = 

[3][0] = ,""
[3][1] = 

[4][0] = ,"email1@yahoo.com"
[4][1] = email1@yahoo.com

[5][0] = ,"1random_value"""
[5][1] = 1random_value""

[6][0] = "2000000000000000"
[6][1] = 2000000000000000

[7][0] = ,""
[7][1] = 

[8][0] = ,""
[8][1] = 

[9][0] = ,""
[9][1] = 

[10][0] = ,"email2@yahoo.com"
[10][1] = email2@yahoo.com

[11][0] = ,"2random_value\""
[11][1] = 2random_value\"
于 2013-07-19T23:57:55.300 回答