java - 为 DataInput 将字符串编码为“修改后的 UTF-8”

Question

我想将字符串值编码为修改后的 UTF-8 格式字节。就像是

byte[] bytes = MagicEncoder.encode(str, "modified UTF-8");
DataInput input = new DataInputStream(new ByteArrayInputStream(bytes));

DataInput 的每个 read*() 方法都必须能够正确读取底层字节。

score 1 · Accepted Answer

采用DataOutputStream

   ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
   DataOutputStream dataOutputStream = new DataOutputStream(byteOutputStream);
   dataOutputStream.writeUTF("some string to write");
   dataOutputStream.close();

结果可在byteOutputStream.toByteArray()

score 0 · Accepted Answer

作为信息：

修改后的 UTF-8 编码简单地将 nul 字符 U+0000（通常编码为字节 0）替换为字节序列 C0 80，即正常的多字节编码，用于代码 > 0x7F。 （因此正常的 UTF-8 解码就足够了。）

byte[] originalBytes;
int nulCount = 0;
for (int i = 0; i < originalBytes.length; ++i) {
    if (originalBytes[i] == 0) {
        ++nulCount;
    }
}

byte[] convertedBytes = new byte[originalCount + nulCount];
for (int i = 0, j = 0; i < originalBytes.length; ++i, ++j) {
    convertedBytes[j] = originalBytes[i];
    if (originalBytes[i] == 0) {
        convertedBytes[j] = 0xC0;
        ++j;
        convertedBytes[j] = 0x80;
    }
}

最好使用 System.arrayCopy，并检查是否 nulCount == 0。

java - 为 DataInput 将字符串编码为“修改后的 UTF-8”

2 回答 2

Related

Reference