java - 在 java 中解码 UTF-8 文本时遇到 MalformedInputexception

Question

我通过从 DB 读取文本，使用 UTF-8 编码器形成一个文件，如下所示：

csvBufWr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fname), Charset.forName("UTF-8").newEncoder()), (int) buffersize);
csvBufWr.write(recordtoinsert);
csvBufWr.newLine();

然后根据记录，使用 shell 脚本将此文件与另一个文件（来自另一个我无法控制的系统）进行比较。合并后，我必须使用 Apache POI 创建一个 Excel 表。所以我读了下面的文件并写入excel表。

CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
//decoder.onMalformedInput(CodingErrorAction.IGNORE);

csvBufRdr = new BufferedReader(new InputStreamReader(new FileInputStream(pathAndFileName), decoder));
// read the file line by line, parse the record and write them
// to the XL file
while ((line = csvBufRdr.readLine()) != null) {
    if (!line.isEmpty() && line.length() > 8) {
        parseAndWrite2Sheet(line, sheet, workBook, sheetName);
    }
}

line = csvBufRdr.readLine()但是，在阅读了一些随机数的行后，我遇到了 MalformedInputException 。我仔细检查了正在读取的文件，似乎没有奇怪的字符。即使我删除发生异常的行及其上下两行，我也会在同一行号遇到异常。添加decoder.onMalformedInput(CodingErrorAction.IGNORE)似乎解决了这个问题，但每个人都担心我们是否会丢弃一个记录或一个不可接受的字符。

我对比了生成的excel和使用的文件，好像没有什么区别。谁能指出我为什么会这样？

是因为 LINUX 中的合并，AFAIK 默认处理 UTF 文件，似乎不太可能导致问题。

我已经黔驴技穷了！

java - 在 java 中解码 UTF-8 文本时遇到 MalformedInputexception

0 回答 0

Related

Reference