这是我对波兰语变音符号规范化的波兰语停止列表的快速实施。
class StopList
{
private HashSet<String> set = new HashSet<String>();
public void add(String word)
{
word = word.trim().toLowerCase();
word = normalize(word);
set.add(word);
}
public boolean contains(final String string)
{
return set.contains(string) || set.contains(normalize(string));
}
private char normalizeChar(final char c)
{
switch ( c)
{
case 'ą':
return 'a';
case 'ć':
return 'c';
case 'ę':
return 'e';
case 'ł':
return 'l';
case 'ń':
return 'n';
case 'ó':
return 'o';
case 'ś':
return 's';
case 'ż':
case 'ź':
return 'z';
}
return c;
}
private String normalize(final String word)
{
if (word == null || "".equals(word))
{
return word;
}
char[] charArray = word.toCharArray();
char[] normalizedArray = new char[charArray.length];
for (int i = 0; i < normalizedArray.length; i++)
{
normalizedArray[i] = normalizeChar(charArray[i]);
}
return new String(normalizedArray);
}
}
我在网上找不到任何其他解决方案。所以也许它会对某人有帮助(?)