4

我有一个六位 unicode 字符,例如U+100000,我希望char在我的 C# 代码中与另一个字符进行比较。

我对MSDN 文档的阅读是,这个字符不能用 a 表示char,而必须用 a 表示string

U+10000 到 U+10FFFF 范围内的 Unicode 字符在字符文字中是不允许的,并且在字符串文字中使用 Unicode 代理对表示

我觉得我遗漏了一些明显的东西,但是您如何才能使以下比较正常工作:

public bool IsCharLessThan(char myChar, string upperBound)
{
    return myChar < upperBound; // will not compile as a char is not comparable to a string
}

Assert.IsTrue(AnExample('\u0066', "\u100000"));
Assert.IsFalse(AnExample("\u100000", "\u100000")); // again won't compile as this is a string and not a char

编辑

k,我想我需要两种方法,一种接受字符,另一种接受“大字符”,即字符串。所以:

public bool IsCharLessThan(char myChar, string upperBound)
{
    return true; // every char is less than a BigChar
}

public bool IsCharLessThan(string myBigChar, string upperBound)
{
    return string.Compare(myBigChar, upperBound) < 0;
}

Assert.IsTrue(AnExample('\u0066', "\u100000));
Assert.IsFalse(AnExample("\u100022", "\u100000"));
4

2 回答 2

5

To construct a string with the Unicode code point U+10FFFF using a string literal, you need to work out the surrogate pair involved.

In this case, you need:

string bigCharacter = "\uDBFF\uDFFF";

Or you can use char.ConvertFromUtf32:

string bigCharacter = char.ConvertFromUtf32(0x10FFFF);

It's not clear what you want your method to achieve, but if you need it to work with characters not in the BMP, you'll need to make it accept int instead of char, or a string.

As per the documentation for string, if you want to iterate over characters in a string as full Unicode values, use TextElementEnumerator or StringInfo.

Note that you do need to do this explicitly. If you just use ordinal values, it will check UTF-16 code units, not the UTF-32 code points. For example:

string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
Console.WriteLine(string.Compare(text, upperBound, StringComparison.Ordinal));

This prints out a value greater than zero, suggesting that text is greater than upperBound here. Instead, you should use char.ConvertToUtf32:

string text = "\uF000";
string upperBound = "\uDBFF\uDFFF";
int textUtf32 = char.ConvertToUtf32(text, 0);
int upperBoundUtf32 = char.ConvertToUtf32(upperBound, 0);
Console.WriteLine(textUtf32 < upperBoundUtf32); // True

So that's probably what you need to do in your method. You might want to use StringInfo.LengthInTextElements to check that the strings really are single UTF-32 code points first.

于 2012-10-26T20:25:18.433 回答
1

From https://msdn.microsoft.com/library/aa664669.aspx, you have to use \U with full 8 hex digits. So for example:

string str1 = "\U0001F300";
string str2 = "\uD83C\uDF00";
bool eq = str1 == str2;

using the :cyclone: emoji.

于 2015-06-11T12:49:28.893 回答