在 Java 7 中获取带有“Windows-1252”的文本,这是 Windows Latin-1。
Path oldPath = Paths.get("C:/Temp/old.txt");
Path newPath = Paths.get("C:/Temp/new.txt");
byte[] bytes = Files.readAllBytes(oldPath);
String content = "\uFEFF" + new String(bytes, "Windows-1252");
bytes = content.getBytes("UTF-8");
Files.write(newPath, bytes, StandardOption.WRITE);
这需要字节,将它们解释为 Windows Latin-1。而对于记事本来说,诀窍是:记事本通过前面的 BOM 标记字符识别编码。一个零宽度空间,通常不用于 UTF-8。
然后它从字符串中获取 UTF-8 编码。
Windows-1252 是 ISO-8859-1(纯拉丁语 1),但有一些特殊字符,如逗号引号,范围为 0x80 - 0xBF。
在 Java 6 中:
File oldPath = new File("C:/Temp/old.txt");
File newPath = new File("C:/Temp/new.txt");
long longLength = oldPath.length();
if (longLength > Integer.MAX_VALUE) {
throw new IllegalArgumentException("File too large: " + oldPath.getPath());
}
int fileSize = (int)longLength;
byte[] bytes = new byte[fileSize];
InputStream in = new FileInputStream(oldPath);
int nread = in.read(bytes);
in.close();
assert nread == fileSize;
String content = "\uFEFF" + new String(bytes, "Windows-1252");
bytes = content.getBytes("UTF-8");
OutputStream out = new FileOutputStream(newPath);
out.write(bytes);
out.close();