java - 防止将 Unicode 字节顺序标记写入文件中间

Question

此代码在文件通道中写入两个字符串

final byte[] title = "Title: ".getBytes("UTF-16");
final byte[] body = "This is a string.".getBytes("UTF-16");
ByteBuffer titlebuf = ByteBuffer.wrap(title);
ByteBuffer bodybuf = ByteBuffer.wrap(body);
FileChannel fc = FileChannel.open(p, READ, WRITE, TRUNCATE_EXISTING);
fc.position(title.length); // second string written first, but not relevant to the problem
while (bodybuf.hasRemaining()) fc.write(bodybuf);
fc.position(0);
while (titlebuf.hasRemaining()) fc.write(titlebuf);

每个字符串都以 BOM 为前缀。

[Title: ?T]  *254 255* 0 84 0 105 0 116 0 108 0 101 58 0 32 *254 255* 0 84

虽然在文件的开头有一个是可以的，但是当流的中间有一个时会产生问题。

我怎样才能防止这种情况发生？

score 2 · Accepted Answer

当您使用 BOM 调用 get UTF-16 时，会插入 BOM 字节：

final byte[] title = "Title: ".getBytes("UTF-16");

检查 title.length ，您会发现它包含额外的 2 个字节用于 BOM 标记

因此您可以在包装到 ByteBuffer 之前处理这些数组并从中删除 BOM，或者您可以在将 ByteBuffer 写入文件时忽略它

其他解决方案，您可以使用不会写入 BOM 标记的 UTF-16 Little/BIG Endianness：

final byte[] title = "Title: ".getBytes("UTF-16LE");

或者如果不需要 UTF-16，您可以使用 UTF-8：

final byte[] title = "Title: ".getBytes("UTF-8");

java - 防止将 Unicode 字节顺序标记写入文件中间

1 回答 1

Related

Reference