我正在使用以下链接创建 key = unicode 字符值的哈希图,值是它应该映射到的实际字符 - https://github.com/lmjabreu/solr-conftemplate/blob/master/mapping-ISOLatin1Accent。文本
到目前为止,我已经编写了以下代码来从字符串中删除重音符号
public class ACCENTS {
public static void main(String[] args){
// this is the hashmap that stores the mappings of the characters to their ascii equivalent
HashMap<Character, Character> characterMappings = new HashMap<>();
characterMappings.put('\u00C0', 'A');
characterMappings.put('\u00C1', 'A');
characterMappings.put('\u00C2', 'A');
characterMappings.put('\u00C3', 'A');
characterMappings.put('\u00C4', 'A');
characterMappings.put('\u00C5', 'A');
characterMappings.put('\u00C7','C');
characterMappings.put('\u00C8', 'E');
characterMappings.put('\u00C9','E');
characterMappings.put('\u00CA', 'E');
characterMappings.put('\u00CB', 'E');
characterMappings.put('\u00CC', 'I');
characterMappings.put('\u00CD', 'I');
characterMappings.put('\u00CE', 'I');
characterMappings.put('\u00CF', 'I');
characterMappings.put('\u00D0', 'D');
characterMappings.put('\u00D1', 'N');
characterMappings.put('\u00D2', 'O');
characterMappings.put('\u00D3', 'O');
characterMappings.put('\u00D4', 'O');
characterMappings.put('\u00D5', 'O');
characterMappings.put('\u00D6', 'O');
characterMappings.put('\u00D8', 'O');
characterMappings.put('\u00D9', 'U');
characterMappings.put('\u00DA', 'U');
characterMappings.put('\u00DB', 'U');
characterMappings.put('\u00DC', 'U');
characterMappings.put('\u00DD', 'Y');
characterMappings.put('\u0178', 'Y');
characterMappings.put('\u00E0', 'a');
characterMappings.put('\u00E1', 'a');
characterMappings.put('\u00E2', 'a');
characterMappings.put('\u00E3','a');
characterMappings.put('\u00E4', 'a');
characterMappings.put('\u00E5', 'a');
characterMappings.put('\u00E7', 'c');
characterMappings.put('\u00E8', 'e');
characterMappings.put('\u00E9', 'e');
characterMappings.put('\u00EA','e');
characterMappings.put('\u00EB', 'e');
characterMappings.put('\u00EC', 'i');
characterMappings.put('\u00ED', 'i');
characterMappings.put('\u00EE', 'i');
characterMappings.put('\u00EF', 'i');
characterMappings.put('\u00F0', 'd');
characterMappings.put('\u00F1','n' );
characterMappings.put('\u00F2', 'o');
characterMappings.put('\u00F3', 'o');
characterMappings.put('\u00F4', 'o');
characterMappings.put('\u00F5', 'o');
characterMappings.put('\u00F6', 'o');
characterMappings.put('\u00F8', 'o');
characterMappings.put('\u00F9', 'u');
characterMappings.put('\u00FA', 'u');
characterMappings.put('\u00FB', 'u');
characterMappings.put('\u00FC', 'u');
characterMappings.put('\u00FD', 'y');
characterMappings.put('\u00FF', 'y');
String token = "nа̀ра";
String newString = "";
for(int i = 0 ; i < token.length() ; ++i){
if( characterMappings.containsKey(token.charAt(i)) )
newString += characterMappings.get(token.charAt(i));
else
newString += token.charAt(i);
}
System.out.println(newString);
}
}
预期的结果应该是“napa”,但事实证明没有执行任何转换,这可能是导致这种情况下偏差的可能原因,我找不到。