c# - 检索使用错误编码编码的文本

Question

我有一个从 Foxpro（基于 Dos 的）程序导出的文本文件，但该文本包含非英文字符（阿拉伯语 [从右到左]），现在导出的字符串就像这样 "¤“îگüَن" 。

有没有办法将它们转换回原来的值？

score 1 · Accepted Answer

您应该使用正确的代码页读取数据。

public static string ReadFile(string path, int codepage)
{
    return Encoding.GetEncoding(codepage)
        .GetString(File.ReadAllBytes(path));
}

使用正确的代码页 ID 调用函数，对于 MS-DOS 阿拉伯语，它应该是“708”，完整列表可以从Wikipedia开始。

string content = ReadFile(@"c:\test.txt", 708);

使用查找表从不支持的编码转换的解决方案（仅对于字符 > 127 需要映射）：

public static string ReadFile(string path, byte[] translationTable, int codepage)
{
    byte[] content = File.ReadAllBytes(path);
    for (int i=0; i < content.Length; ++i)
    {
        byte value = content[i];
        if (value > 127)
            content[i] = translationTable[value - 128];
    }

    return Encoding.GetEncoding(codepage)
        .GetString(content);
}

转换表示例：

索引 原文 (IS) 翻译 (1256)
...
13 141 194
...

c# - 检索使用错误编码编码的文本

1 回答 1

Related

Reference