9

有没有办法在 C# 中确定字节数组的编码?

我有任何字符串,例如 "Lorem ipsum áéíóú ñÑç",并且我使用多种编码获得字节数组。

我想要一种检测字节数组中编码的唯一方法,然后我再次获得字符串值。

其他问题,也许,我将在数据库中有一个存储 BLOB 的列(如字节数组)。以前转换为 UTF-8 字节数组的字符串。也许另一个应用程序使用 Unicode 编码将字符串转换为字节数组。

在数据库列中,有几种编码的字节数组。检测字节数组的编码将非常有用。我需要一种方法来查找字节数组的编码。

测试:

string DataXmlForSupport = "<support><machinename></machinename><comments>Este es el log 1 áéíóú</comments></support>";
        string DataXmlForSupport2 = "Lorem ipsum áéíóú ñÑç";

        [TestMethod]
        public void Encoding_byte_array_string()
        {
            var uencoding = new System.Text.UnicodeEncoding();
            byte[] data = uencoding.GetBytes(DataXmlForSupport);

            var dataXml = Encoding.Unicode.GetString(data);
            Assert.AreEqual(DataXmlForSupport, dataXml, "Se esperaba resultados Unicode");

            dataXml = Encoding.UTF8.GetString(data);
            Assert.AreNotEqual(DataXmlForSupport, dataXml, "NO Se esperaba resultados UTF8");

            var utf8 = new System.Text.UTF8Encoding();
            data = utf8.GetBytes(DataXmlForSupport2);

            dataXml = Encoding.UTF8.GetString(data);
            Assert.AreEqual(DataXmlForSupport2, dataXml, "Se esperaba resultados UTF8");

            dataXml = Encoding.Unicode.GetString(data);
            Assert.AreNotEqual(DataXmlForSupport2, dataXml, "NO Se esperaba resultados Unicode");

        }
4

3 回答 3

4

In short, no. Please see How to detect the character encoding of a text file? for a detailed answer on various encodings and why they can't be automatically determined.

Your best solution is to convert the string from it's original encoding to UTF8 and convert that to a byte array. Then you'll know your byte array's encoding...

于 2013-10-22T13:48:23.943 回答
1

我意识到我在这里聚会迟到了,但我只是需要做这件事并找到了一个好方法:

byte[] data; // Populate this however you see fit with your data
string text;
Encoding enc;
using (StreamReader reader = new StreamReader(new MemoryStream(data), 
                                              detectEncodingFromByteOrderMarks: true))
{
    text = reader.ReadToEnd();
    enc = reader.CurrentEncoding; // the reader detects the encoding for you!
}
于 2020-08-21T16:05:11.137 回答
-1

作为其他响应的补充,您可以尝试执行以下操作:

string str = BitConverter.ToString(byte_array);
byte[] byte_array = Encoding.UTF8.GetBytes(str);
于 2019-09-20T12:40:30.727 回答