c# - How to retrieve the unicode decimal representation of the chars in a string containing hindi text?

Question

I am using visual studio 2010 in c# for converting text into unicodes. Like i have a string abc= "मेरा" . there are 4 characters in this string. i need all the four unicode characters. Please help me.

score 3 · Accepted Answer

Since a .Net char is a Unicode character (at least, for the BMP code point), you can simply enumerate all characters in a string:

var abc = "मेरा";

foreach (var c in abc)
{
    Console.WriteLine((int)c);
}

resulting in

score 2 · Accepted Answer

2

于 2011-05-05T19:56:50.630 回答

score 1 · Accepted Answer

1

use

System.Text.Encoding.UTF8.GetBytes(abc)

that will return your unicode values.

于 2011-05-05T19:34:39.800 回答

score 1 · Accepted Answer

If you have the string s = मेरा then you already have the answer.

This string contains four code points in the BMP which in UTF-16 are represented by 8 bytes. You can access them by index with s[i], with a foreach loop etc.

If you want the underlying 8 bytes you can access them as so:

string str = @"मेरा";
byte[] arr = System.Text.UnicodeEncoding.GetBytes(str);

score 1 · Accepted Answer

如果您尝试将文件从传统编码转换为 Unicode：

读取文件，提供源文件的正确编码，然后使用所需的 Unicode 编码方案写入文件。

    using (StreamReader reader = new StreamReader(@"C:\MyFile.txt", Encoding.GetEncoding("ISCII")))
    using (StreamWriter writer = new StreamWriter(@"C:\MyConvertedFile.txt", false, Encoding.UTF8))
    {
        writer.Write(reader.ReadToEnd());
    }

如果您正在寻找梵文字符到 Unicode 代码点的映射：

您可以在此处的 Unicode 联盟网站上找到该图表。

请注意，Unicode 代码点传统上是用十六进制编写的。因此，代码点将写为 U+092E，而不是十进制数 2350，它在代码表上显示为 092E。

c# - How to retrieve the unicode decimal representation of the chars in a string containing hindi text?

5 回答 5

Related

Reference