I am using visual studio 2010 in c# for converting text into unicodes. Like i have a string abc= "मेरा" . there are 4 characters in this string. i need all the four unicode characters. Please help me.
5 回答
Since a .Net char is a Unicode character (at least, for the BMP code point), you can simply enumerate all characters in a string:
var abc = "मेरा";
foreach (var c in abc)
{
Console.WriteLine((int)c);
}
resulting in
2350
2375
2352
2366
use
System.Text.Encoding.UTF8.GetBytes(abc)
that will return your unicode values.
If you have the string s = मेरा
then you already have the answer.
This string contains four code points in the BMP which in UTF-16 are represented by 8 bytes. You can access them by index with s[i]
, with a foreach
loop etc.
If you want the underlying 8 bytes you can access them as so:
string str = @"मेरा";
byte[] arr = System.Text.UnicodeEncoding.GetBytes(str);
如果您尝试将文件从传统编码转换为 Unicode:
读取文件,提供源文件的正确编码,然后使用所需的 Unicode 编码方案写入文件。
using (StreamReader reader = new StreamReader(@"C:\MyFile.txt", Encoding.GetEncoding("ISCII")))
using (StreamWriter writer = new StreamWriter(@"C:\MyConvertedFile.txt", false, Encoding.UTF8))
{
writer.Write(reader.ReadToEnd());
}
如果您正在寻找梵文字符到 Unicode 代码点的映射:
您可以在此处的Unicode 联盟网站上找到该图表。
请注意,Unicode 代码点传统上是用十六进制编写的。因此,代码点将写为 U+092E,而不是十进制数 2350,它在代码表上显示为 092E。