2

1)字符串中的高低代理字符顺序是否固定?我可以依靠它吗?在 Windows 上进行实验,highSurrogate 首先进入 String(在 String.charAt(int index) 方面的较低索引处)。在任何平台(Linux 等)上总是如此吗?这是否记录在案?

    int[] codePoint = { 0x1F71D };
    String s = new String(codePoint, 0, 1);
    System.out.println(s.length()); // 2
    System.out.println(s); // 

    System.out.println((int) Character.highSurrogate(codePoint[0]));
    System.out.println((int) Character.lowSurrogate(codePoint[0]));

    System.out.println((int) s.charAt(0)); // highSurrogate
    System.out.println((int) s.charAt(1)); // lowSurrogate

2)此外,我有点困惑:高/低代理代码单元顺序和字节序之间是否存在关联?我想没有任何关联,这两个概念是正交的?

4

1 回答 1

0

UTF-8 mandates that the surrogate indicator precede the second char, so that's how Java does it. Endianness is a byte order, not a char order. The JVM spec mandates endianness for the class-file format. Endianness at runtime is specified by the underlying physical platform. Some search engine time will grant you the details. http://www.unicode.org/ https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

于 2017-05-07T16:27:49.820 回答