java-native-interface - 如何使用 JNI 将 java 字符串转换为宽字符串

Question

几个月前，我编写了一个使用 JNI 封装 C API 的 Java API。C API 使用 char 字符串，我使用 GetStringUTFChars 从 Java 字符串创建 C 字符串。

我忽略了思考非 ASCII 字符可能出现的问题。

从那时起，C API 的创建者为他的每个需要或返回 wchar_t 字符串的 C 函数创建了宽字符等效项。我想更新我的 Java API 以使用这些宽字符函数并克服我在使用非 ASCII 字符时遇到的问题。

研究了 JNI 文档后，我对使用 GetStringChars 或 GetStringRegion 方法的相对优点感到有些困惑。

我知道 wchar_t 字符的大小在 Windows 和 Linux 之间有所不同，并且不确定创建 C 字符串的最有效方法（然后将它们转换回 Java 字符串）。

这是我目前拥有的代码，我认为它创建了一个每个字符两个字节的字符串：

int len;
jchar *Src;

len = (*env)->GetStringLength(env, jSrc);
printf("Length of jSrc is %d\n", len);

Src = (jchar *)malloc((len + 1)*sizeof(jchar));
(*env)->GetStringRegion(env, jSrc, 0, len, Src);
Src[len] = '\0';

但是，当 wchar_t 的大小与 jchar 不同时，这将需要修改。

score 2 · Accepted Answer

C API 创建者是否愿意退后一步并使用UTF-8重新实现？:) 你的工作基本上会消失，只需要GetStringUTFChars/ NewStringUTF。

jchar类型定义为unsigned short并等效于UTF-16char的JVM 。所以在 Windows 上也是 2 字节UTF-16，你可以取消你提供的代码。只需复制原始字节，并相应地分配。完成 C API 调用后不要忘记释放。补充用于转换回 jstring。wchar_tNewString

wchar_t我知道的唯一其他大小是 4 字节（最突出的是 Linux）是UTF-32。问题来了：UTF-32 不仅仅是 UTF-16 以某种方式填充到 4 个字节。分配双倍的内存只是一个开始。有一个实质性的转换要做，就像这个似乎足够自由的转换。

But if you are not after performance that much and are willing to give up the plain memory copying on Windows, i suggest going jstring to UTF-8 (which is what JNI provides natively with documented functionality) and then UTF-8 to UTF-16 or UTF-32 depending on sizeof(wchar_t). There won't be any assumptions about what byte order and UTF encoding each platform gives. You seem to care about it, i see that you are checking sizeof(jchar) which is 2 for the most of the visible universe :)

java-native-interface - 如何使用 JNI 将 java 字符串转换为宽字符串

1 回答 1

Related

Reference