0

Below is the code with a description of my problem:

  1. I need to find the encoding of this file, but not now!

    string FilePath = @"C:\01 New.txt";
    System.IO.FileStream inFile = new System.IO.FileStream(FilePath, System.IO.FileMode.Open,System.IO.FileAccess.Read);
    byte[] binaryData = new Byte[inFile.Length];
    long bytesRead = inFile.Read(binaryData, 0, (int)inFile.Length);
    inFile.Close();
    string base64String = System.Convert.ToBase64String(binaryData, 0, binaryData.Length);// Converting ToBase64String
    Console.WriteLine("base64String is " + base64String);
    

    Please assume that the above process is done by something else, and it only returns "base64String". Now I need to read it properly.

  2. For that, I need the "ENCODING" of the base64String:

    byte[] s = Convert.FromBase64String(base64String);
    switch (GET_ENCODING(base64String))
    {
      case "ASCII":
        Console.WriteLine("ASCII text is " + Encoding.ASCII.GetString(s).Trim()); break;
      case "Default":
        Console.WriteLine("Default text is " + Encoding.Default.GetString(s).Trim()); break;
      case "UTF7":
        Console.WriteLine("UTF7 text is " + Encoding.UTF7.GetString(s).Trim()); break;
      case "UTF8":
        Console.WriteLine("UTF8 text is " + Encoding.UTF8.GetString(s).Trim()); break;
      case "BigEndianUnicode":
        Console.WriteLine("BigEndianUnicode " + Encoding.BigEndianUnicode.GetString(s).Trim()); break;
       case "UTF32":
         Console.WriteLine("UTF32 text is " + Encoding.UTF32.GetString(s).Trim()); break;
       default:
         break;
      }
    
4

1 回答 1

2

Base64 编码与问题无关,因为您知道这是源编码。基本上你有一个字节流来编码为文本,而不知道目标编码或字符集。这意味着您的文本确实受到了损害;正如@deceze 评论的那样,最好的办法是确保编码始终是已知/可用的。

如果文本是 XML、HTML 或 MIME,那么您可以分两次执行此操作:

  1. 编码为 ASCII/UTF-8,然后解析/搜索charset值为“UTF-8”、“ISO-8859-1”等的属性。
  2. 编码为步骤 1 中标识的字符集。

否则,您将需要一种启发式方法来检测编码。这不会是 100% 可靠的。请参阅以下链接:

编辑:XML/HTML 有可能被编码为 ASCII/UTF-8 以外的东西;MIME 也可能如此。这意味着即使对于这些文件类型,也需要启发式方法,除非您知道编码只能是前 128 个字符相同的 ASCII/UTF-8/ISO-8859-1。

于 2013-10-25T02:41:46.343 回答