java - Univocity Parser：TextParsingException，同时解析具有起始双引号（“）但没有结束双引号（”）的行

Question

解析文件时出现异常：

com.univocity.parsers.common.TextParsingException: Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096). 
Identified line separator characters in the parsed content. This may be the cause of the error. The line separator in your parser settings is set to '\r\n'. Parsed content: The quick brown fox jumps over the lazy dog.|[\n]

文件内容：

1234|5678|The quick brown fox jumps over the lazy dog.|
1234|5678|"The quick brown fox jumps over the lazy dog.|
1234|5678|The quick brown fox jumps over the lazy dog.|
1234|5678|The quick brown fox jumps over the lazy dog.|
1234|5678|The quick brown fox jumps over the lazy dog.|
.........
.........
1234|5678|The quick brown fox jumps over the lazy dog.|

我正在使用以下 CSV 解析器设置：

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setLineSeparatorDetectionEnabled(true);
parserSettings.getFormat().setDelimiter('|');
parserSettings.setIgnoreLeadingWhitespaces(true);
parserSettings.setIgnoreTrailingWhitespaces(true);
parserSettings.setHeaderExtractionEnabled(false);
parserSettings.setMaxCharsPerColumn(4096);

我可以从异常中推断出，在第二行中我有一个起始双引号 (")。但该行不以双引号 (") 结尾。因此，在这种情况下，列长度达到 EOF（文件结尾）。

测试构建：2.2.2

score 3 · Accepted Answer

这就是 CSV 解析器应该如何工作的。如果找到引用，那是因为引用之后的内容可以包含分隔符、行尾或其他（希望是）转义的引号。

在您的情况下解决这种情况的唯一方法是执行以下操作：

parserSettings.getFormat().setQuote('\0');

这将使解析器忽略引号并将值作为未引用的值处理。找到行尾或分隔符后，将按照您的预期收集该值。

java - Univocity Parser：TextParsingException，同时解析具有起始双引号（“）但没有结束双引号（”）的行

1 回答 1

Related

Reference