java - String toByte 和 reverse，如果字节被修改，则不是双射的

Question

以下代码更改字符串的每个字节并创建一个新字符串。

public static String convert(String s) {
    byte[] bytes = s.getBytes();
    byte[] convert = new byte[bytes.length];

    for (int i = 0; i < bytes.length; i++) {
        convert[i] = (byte) ~bytes[i];
    }

    return new String(convert);
}

问题：为什么 convert() 不是双射的？

convert(convert("Test String")).equals("Test String") === false

score 3 · Accepted Answer

当您使用构造函数 String(byte[]) 时，它不一定需要每个字节一个字母，它采用默认字符集；如果是 UTF-8，那么构造函数将尝试从两个或三个字节而不是一个字节中解码一些字符。

当您使用位补码逐字节转换时，应用默认字符集时结果可能会有所不同。

如果你只使用ASCII字符，你可以试试这个版本的函数：

// ONLY if you use ASCII as Charset
public static String convert(String s) {
    Charset ASCII = Charset.forName("ASCII");
    byte[] bytes = s.getBytes(ASCII);
    byte[] convert = new byte[bytes.length];

    for (int i = 0; i < bytes.length; i++) {
        convert[i] = (byte) (~bytes[i] & 0x7F);
    }

    return new String(convert, ASCII);
}

score 0 · Accepted Answer

因为当您将操作的字节转换为字符串时信息会丢失，反之亦然在下面的这一行中 for (int i = 0; i < bytes.length; i++) { convert[i] = (byte) ~bytes[i]; }

return new String(convert);

如果你进入字符串到字节转换的实现，反之亦然，你会发现涉及到 CharSet 和编码。阅读它们，您将获得有关此行为的详细说明。

java - String toByte 和 reverse，如果字节被修改，则不是双射的

2 回答 2

Related

Reference