java - 使用java将重音字符转换为英文

Question

我有一个要求，我需要使用重音字符进行搜索，这些字符可以用于来自Iceland和的用户Japan。我编写的代码适用于一些重音字符，但不是全部。下面的例子 -

À - returns a. Correct.
Â - returns a. Correct.
Ð - returns Ð. This is breaking. It should return e.
Õ - returns Õ. This is breaking. It should return o.

以下是我的代码：-

String accentConvertStr = StringUtils.stripAccents(myKey);

也试过这个：-

byte[] b = key.getBytes("Cp1252");
System.out.println("" + new String(b, StandardCharsets.UTF_8));

请指教。

score 0 · Accepted Answer

我会说它按预期工作。StringUtils.stripAccents 的底层代码其实如下。

String[] chars  = new String[]{"À","Â","Ð","Õ"};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""));
}

这将输出： A A Ð O

如果您阅读https://stackoverflow.com/a/5697575/9671280的答案，您会发现

Be aware that that will not remove what you might think of as “accent” marks from all characters! There are many it will not do this for. For example, you cannot convert Đ to D or ø to o that way. For that, you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.

如果您仍想使用 StringUtil.stripAccents，则可以单独处理。

请尝试https://github.com/xuender/unidecode它似乎适用于您的情况。

 String normalized = Unidecode.decode(input);

java - 使用java将重音字符转换为英文

1 回答 1

Related

Reference