您只需循环内容并使用字符功能对其进行测试。我使用真正的代码点,所以它支持 Unicode 的补充字符。
在处理代码点时,索引不能简单地增加一,因为某些代码点实际上读取两个字符(也称为代码单元)。这就是我使用 while 和Character.charCount(int cp)
.
/** Method counts and prints number of lower/uppercase codepoints. */
static void countCharacterClasses(String input) {
int upper = 0;
int lower = 0;
int other = 0;
// index counts from 0 till end of string length
int index = 0;
while(index < input.length()) {
// we get the unicode code point at index
// this is the character at index-th position (but fits only in an int)
int cp = input.codePointAt(index);
// we increment index by 1 or 2, depending if cp fits in single char
index += Character.charCount(cp);
// the type of the codepoint is the character class
int type = Character.getType(cp);
// we care only about the character class for lower & uppercase letters
switch(type) {
case Character.UPPERCASE_LETTER:
upper++;
break;
case Character.LOWERCASE_LETTER:
lower++;
break;
default:
other++;
}
}
System.out.printf("Input has %d upper, %d lower and %d other codepoints%n",
upper, lower, other);
}
对于此示例,结果将是:
// test with plain letters, numbers and international chars:
countCharacterClasses("AABBÄäoßabc0\uD801\uDC00");
// U+10400 "DESERET CAPITAL LETTER LONG I" is 2 char UTF16: D801 DC00
Input has 6 upper, 6 lower and 1 other codepoints
它将德语的Sharp-s视为小写(没有大写变体),将特殊补充代码点(两个代码单元/字符长)视为大写。该数字将被计为“其他”。
使用Character.getType(int cp)
代替Character.isUpperCase()
的优点是它只需要查看多个(所有)字符类的代码点一次。这也可以用来计算所有不同的类(字母、空格、控件和所有花哨的其他 unicode 类(TITLECASE_LETTER 等)。
有关为什么需要关心代码点和单元的良好背景信息,请查看:http ://www.joelonsoftware.com/articles/Unicode.html