utf-8 - 大文件的编码转换

Question

我面临一个大（~ 18 GB）文件，从 SQL Server 导出为 Unicode 文本文件，这意味着它的编码是 UTF-16（小端）。该文件现在存储在运行 Linux 的计算机中，但我还没有找到将其转换为 UTF-8 的方法。

起初我尝试使用 iconv，但文件太大了。我的下一个方法是使用拆分并逐个转换文件，但这也不起作用 - 转换过程中出现了很多错误。

那么，关于如何将其转换为 UTF-8 的任何想法？任何帮助都感激不尽。

score 4 · Accepted Answer

由于您使用的是 SQL 服务器，我假设您的平台是 Windows。在最简单的情况下，您可以快速编写一个肮脏的 .NET 应用程序，该应用程序逐行读取源代码并写入转换后的文件。像这样的东西：

using System;
using System.IO;
using System.Text;

namespace UTFConv {
    class Program {
        static void Main(string[] args) {
            try {
                Encoding encSrc = Encoding.Unicode;
                Encoding encDst = Encoding.UTF8;
                uint lines = 0;
                using (StreamReader src = new StreamReader(args[0], encSrc)) {
                    using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
                        string ln;
                        while ((ln = src.ReadLine()) != null) {
                            lines++;
                            dest.WriteLine(ln);
                        }
                    }
                }
                Console.WriteLine("Converted {0} lines", lines);
            } catch (Exception x) {
                Console.WriteLine("Problem converting the file: {0}", x.Message);
            }
        }
    }
}

只需打开 Visual Studio，启动一个新的 C# 控制台应用程序项目，将这段代码粘贴到那里，编译并从命令行运行它。第一个参数是你的源文件，第二个参数是你的目标文件。应该管用。

utf-8 - 大文件的编码转换

1 回答 1

Related

Reference