encoding - 为什么用 UTF-16LE 写入 Groovy 文件会产生 BOM 字符？

Question

你知道为什么下面的第一行和第二行不产生文件的 BOM 而第三行吗？我认为 UTF-16LE 是正确的编码名称，并且该编码不会自动创建 BOM 到文件的开头。

new File("foo-wo-bom.txt").withPrintWriter("utf-16le") {it << "test"}
new File("foo-bom1.txt").withPrintWriter("UnicodeLittleUnmarked") {it << "test"}
new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}

另一个样品

new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}
new File("foo-bom.txt").getBytes().each {System.out.format("%02x ", it)}

印刷

ff fe 74 00 65 00 73 00 74 00

和java

        PrintWriter w = new PrintWriter("foo.txt","UTF-16LE");
        w.print("test");
        w.close();
        FileInputStream r = new FileInputStream("foo.txt");
        int c;
        while ((c = r.read()) != -1) {
            System.out.format("%02x ",c);
        }
        r.close();

印刷

74 00 65 00 73 00 74 00

Java 不会产生 BOM，而 Groovy 会产生 BOM。

score 0 · Accepted Answer

与的行为似乎有所不同withPrintWriter。在你的 GroovyConsole 中试试这个

File file = new File("tmp.txt")
try {
    String text = " "
    String charset = "UTF-16LE"

    file.withPrintWriter(charset) { it << text }
    println "withPrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }

    PrintWriter w = new PrintWriter(file, charset)
    w.print(text)
    w.close()
    println "\n\nnew PrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }
} finally {
    file.delete()
}

它输出

withPrintWriter
ff fe 20 00

新的 PrintWriter
20 00

这是因为调用会new PrintWriter调用 Java 构造函数，但调用withPrintWriter最终会调用org.codehaus.groovy.runtime.ResourceGroovyMethods.writeUTF16BomIfRequired()会写入 BOM。

我不确定这种行为差异是否是故意的。我对此很好奇，所以我在邮件列表上询问。那里的人应该知道设计背后的历史。

编辑：GROOVY-7465是根据上述讨论创建的。

encoding - 为什么用 UTF-16LE 写入 Groovy 文件会产生 BOM 字符？

1 回答 1

Related

Reference