java - BufferedOutputStream 无法按预期使用韩语字符

Question

我正在尝试将韩语字符写入文件，并且它正在写入一些乱码数据，当我在 CSV 中打开它时，我需要解决这些数据以显示为韩语数据。在没有解码回 UTF-8 并显示韩语数据的解决方法的情况下，如何实现我的要求。

    File localExport = File.createTempFile("char-test", ".csv");
    try (
            FileOutputStream fos = new FileOutputStream(localExport);
            BufferedOutputStream bos = new BufferedOutputStream(fos);
            OutputStreamWriter outputStreamWriter =
                    new OutputStreamWriter(bos, StandardCharsets.UTF_8)
    ) {
        ArrayList<String> rows = new ArrayList<>();
        rows.add("\"가짜 사용자\",사용자123,saint1_user123");
        rows.add("\"페이크유저루노도스트레스 성도1\",saint1_user1");
        for (int i=0; i<2; i++) {
            String csvUserStr = rows.get(i);
            outputStreamWriter.write(csvUserStr);
        }
    }

它正在写入以下数据，而不是我实际写入文件的数据。

score 2 · Accepted Answer

你的java代码绝对没有问题。你正在写那些字符，包括韩文，完全按照写的方式写。

您使用什么工具来查看此文件？

就是那个坏了。告诉它该文件是基于 UTF-8 的。如果不能，请使用更好的工具或找出它读取的编码，然后更新您的 java 代码。

请注意，CSV 文件、文本文件等 - 它们不存储用于写入数据的编码。所有读/写文件的程序只需要知道它是什么编码，除了被告知之外没有真正的方法可以知道。

更新：从评论看来，“正在阅读本文的工具”是 excel。

当您使用“导入 CSV”对话框时，Excel 会要求对文件进行编码。在下拉列表中选择 UTF-8。取决于您使用的版本/操作系统，但通常称为“文件来源”。

如果您希望您的客户端不需要乱用默认值，通常默认值是 MacRoman 或 Win1282 之类的，并且使用这样的编码，实际上是不可能得到韩文字符的。他们根本不在那个集合中。

如果您想要“一劳永逸”的方法，请自己生成 excel 文件，例如使用Apache POI。

score 1 · Accepted Answer

CSV files don't have any means to carry encoding information "in-band"—in the file itself. I'm guessing the default character encoding used for Excel CSV imports is the system default, so if that isn't Korean, they will have to specify the encoding when they import the CSV. If your client requires CSV, they have no choice but to accept that behavior.

However, if their requirement is to open your file in Excel (and not that the file has to be CSV format), you could write an Excel spreadsheet instead. The various Excel file formats do include character encoding information, so they would be able to open the file without manually specifying the encoding.

Library recommendations are off-topic, but libraries such Apache POI make writing simple Excel sheets fairly easy. There are additional benefits as well, such as taking care of any necessary escaping for you, so that your file doesn't repeatedly break when unanticipated values are included in the spreadsheet.

score 0 · Accepted Answer

如前所述，Excel 无法检测到文本是以 UTF-8 编码的。一种解决方案是写一个不可见的 BOM 字符作为第一个：

  outputStreamWriter.write("\uFEFF");
  for...

对于杂项 UTF 编码，这通常是多余且丑陋的标记。

顺便看看类文件，它可以将代码减少到一行。

java - BufferedOutputStream 无法按预期使用韩语字符

3 回答 3

Related

Reference