1

I am using visual studio 2010 in c# for converting text into unicodes. Like i have a string abc= "मेरा" . there are 4 characters in this string. i need all the four unicode characters. Please help me.

4

5 回答 5

3

Since a .Net char is a Unicode character (at least, for the BMP code point), you can simply enumerate all characters in a string:

var abc = "मेरा";

foreach (var c in abc)
{
    Console.WriteLine((int)c);
}

resulting in

2350
2375
2352
2366
于 2011-05-05T19:57:21.297 回答
2
于 2011-05-05T19:56:50.630 回答
1

use

System.Text.Encoding.UTF8.GetBytes(abc)

that will return your unicode values.

于 2011-05-05T19:34:39.800 回答
1

If you have the string s = मेरा then you already have the answer.

This string contains four code points in the BMP which in UTF-16 are represented by 8 bytes. You can access them by index with s[i], with a foreach loop etc.

If you want the underlying 8 bytes you can access them as so:

string str = @"मेरा";
byte[] arr = System.Text.UnicodeEncoding.GetBytes(str);
于 2011-05-05T19:57:22.747 回答
1

如果您尝试将文件从传统编码转换为 Unicode:

读取文件,提供源文件的正确编码,然后使用所需的 Unicode 编码方案写入文件。

    using (StreamReader reader = new StreamReader(@"C:\MyFile.txt", Encoding.GetEncoding("ISCII")))
    using (StreamWriter writer = new StreamWriter(@"C:\MyConvertedFile.txt", false, Encoding.UTF8))
    {
        writer.Write(reader.ReadToEnd());
    }

如果您正在寻找梵文字符到 Unicode 代码点的映射:

您可以在此处的Unicode 联盟网站上找到该图表。

请注意,Unicode 代码点传统上是用十六进制编写的。因此,代码点将写为 U+092E,而不是十进制数 2350,它在代码表上显示为 092E。

于 2011-05-05T19:46:24.970 回答