java - 转义 unicode 代理字符？

Question

我有以下文本行（也参见代码：

我要做的是将该表情符号（电话图标）作为两个 \u 字符转义，然后返回其原始电话图标？下面的第一种方法可以正常工作，但我本质上想逃避一个范围，这样我就可以逃避任何这样的字符。我不知道使用下面的第一种方法怎么可能。

如何使用 UnicodeEscaper 作为与 StringEscapeUtils 相同的输出来实现这个基于范围的转义（即转义到两个 \uxx \uxx 然后转义回电话图标）？

import org.apache.commons.lang3.text.translate.UnicodeEscaper;
import org.apache.commons.lang3.text.translate.UnicodeUnescaper;

    String text = "Unicode surrogate here-> <--here";
    // escape the entire string...not what I want because there could
    // be \n \r or any other escape chars that I want left in tact (i just want  a range)
    String text2 = org.apache.commons.lang.StringEscapeUtils.escapeJava(text);
    System.out.println(text2);   // "Unicode surrogate here-> \uD83D\uDCF1<--here"
    // unescape it back to the phone emoticon
    text2 = org.apache.commons.lang.StringEscapeUtils.unescapeJava(text);
    System.out.println(text2); // "Unicode surrogate here-> <--here"

    // How do I do the same as above but but looking for a range of chars to escape (i.e. any unicode surrogate)
    // , which is what i want  and not to escape the entire string
    text2 = UnicodeEscaper.between(0x10000, 0x10FFFF).translate(text);
    System.out.println(text2); // "Unicode surrogate here-> \u1F4F1<--here"
    // unescape .... (need the phone emoticon here)
    text2 = (new UnicodeUnescaper().translate(text2));
    System.out.println(text2);// "Unicode surrogate here-> ὏1<--here"

score 3 · Accepted Answer

回答太晚了。但我发现你需要

org.apache.commons.lang3.text.translate.JavaUnicodeEscaper

类而不是 UnicodeEscaper。

使用它，它会打印：

Unicode surrogate here-> \uD83D\uDCF1<--here

并且取消转义效果很好。

score 2 · Accepted Answer

你的字符串：

"Unicode surrogate here-> \u1F4F1<--here"

不做你认为它做的事。

Achar基本上是一个 UTF-16 代码单元，因此是 16 位。所以这里发生的是你有\u1f41 1; 这解释了你的输出。

我不知道你在这里所说的“转义”，但如果这是用“\u\u”替换代理对，那么看看Character.toChars(). 它将返回char表示一个 Unicode 代码点所需的序列，无论它是否在 BMP（一个字符）中（两个字符）。

对于代码点 U+1f4f1，它将返回一个包含字符 0xd83d 和 0xdcf1 的二元素 char 数组。这就是你想要的。

java - 转义 unicode 代理字符？

2 回答 2

Related

Reference