c# - 如何以最大输出大小压缩多个文件，同时将文件对保持在一起

Question

我如何将文件压缩成不超过一定大小的单独 zip 文件，同时将文件对保存在一起？

我想使用带有命令提示符、批处理文件或一些 C# 代码的 CLI 来执行此操作。我不在乎是否使用 DotNetZip、7Zip 或 WinZip。

一个示例场景是我有一个大目录 C:\LargeDirectory，25 GB。该目录包含成对出现的文件。例如 File1.pdf 和 File1.ind。这些对需要在每个 zip 文件中保持在一起。每个输出 zip 文件都需要保持在下面，比如说 2 GB。

编辑：它们将在每个输出 zip 中多对。如果其中一对会导致输出 zip 超过 2GB，那么它们将被拆分为另一个。

score 3 · Accepted Answer

这是一些根据问题中的要求完成工作的 C# 代码。它将文件目录压缩成单独的 zip 文件，总大小不超过一定大小，同时将文件对保存在一起。在这种情况下，这是在代码中明确阐明的两种文件类型，但可以通过更多定制使其更通用。

private void CreateZip(string largeDir, string splitIntoDir, double maxFolderSize)
{
    int fileNumber = 1;
    List<String> files = new List<String>(Directory.GetFiles(largeDir, "*.pdf"));
    StringBuilder outputZip = new StringBuilder(splitIntoDir + Path.DirectorySeparatorChar + Path.GetFileName(largeDir) + "_" + fileNumber + @".zip");
    double currentOutputSize = 0;
    List<String> toAdd = new List<String>();
    foreach (String file in files)
    {
        if(File.Exists(file) && File.Exists(file.Replace(".pdf", ".idf")))
            currentOutputSize += new FileInfo(file).Length + new FileInfo(file.Replace(".pdf", ".idf")).Length;

    if (currentOutputSize &lt;= maxFolderSize)
    {
        toAdd.AddRange(new String[]{file, file.Replace(".pdf", ".idf")});

    }
    else
    {
        using (ZipFile zip = new ZipFile(outputZip.ToString()))
        {
            foreach(String aFile in toAdd)
                zip.AddFile(aFile, "");
            zip.Save();
        }
        toAdd.Clear();
        fileNumber += 1;
        outputZip.Clear();
        outputZip.Append(splitIntoDir + Path.DirectorySeparatorChar + Path.GetFileName(largeDir) + "_" + fileNumber + @".zip");
        currentOutputSize = new FileInfo(file).Length + new FileInfo(file.Replace(".pdf", ".idf")).Length;
        toAdd.Add(file);
    }
}


}

更新：提高了算法的速度。

score 1 · Accepted Answer

我已经重写了 dkroy 的算法，因为我认为它进行了过多的强制转换并且过于频繁地查询磁盘。

现在，它一次性加载所有文件信息，让磁盘保持安静，直到我们需要写入 zip 文件。我还优化了比较，它是在内存上完成的，使用引用而不是每次迭代都创建新对象，还将一些变量更改为标准 .NET 类型并删除 StringBuilder，您可以检查代码：

private void CreateZip(string largeDir, string splitIntoDir, double maxFolderSize)
        {
            int fileNumber = 1;

            // We get all the PDFs and idf files at once
            FileInfo[] files = new DirectoryInfo(largeDir).GetFiles("*.pdf");
            FileInfo[] filesPair = new DirectoryInfo(largeDir).GetFiles("*.idf");

            List<FileInfo> toAdd = new List<FileInfo>();

            // We match on memory the filenames without extension and create an Anonymous object
            // which will contain both files

            var pairs = files.Join(filesPair, f => Path.GetFileNameWithoutExtension(f.FullName),
                idx => Path.GetFileNameWithoutExtension(idx.FullName), (f, idx) => new {Pdf = f, Index = idx});

            long currentOutputSize = 0;
            string outputZip = string.Format("{0}{1}{2}_{3}.zip", splitIntoDir, Path.DirectorySeparatorChar, Path.GetFileName(largeDir), fileNumber);

            // iterate the pairs that matched the collection
            foreach (var pair in pairs)
            {
                // Sum the current pair of files
                currentOutputSize += pair.Pdf.Length + pair.Index.Length;

                if (currentOutputSize < maxFolderSize) 
                {
                    toAdd.Add(pair.Pdf);
                    toAdd.Add(pair.Index);
                }
                else
                {
                    using (ZipFile zip = new ZipFile(outputZip))
                    { 
                        toAdd.ForEach(f=> zip.AddFile(f.FullName, string.Empty));
                        zip.Save();
                    }

                    // We start a new zip
                    toAdd.Clear();
                    fileNumber++;
                    currentOutputSize += pair.Pdf.Length + pair.Index.Length;
                    foutputZip = string.Format("{0}{1}{2}_{3}.zip", splitIntoDir, Path.DirectorySeparatorChar, Path.GetFileName(largeDir), fileNumber);

                    // We add the current iteration's files
                    toAdd.Add(pair.Pdf);
                    toAdd.Add(pair.Index);
                }
            }
        }

score 0 · Accepted Answer

我相信您希望将档案分成几个相等的部分。这可以通过使用 7-zip 和批处理来完成。

要将档案分割成相等的大小：

@echo off
@set "ZipPath=%ProgramFiles%\7-Zip\7z.exe"
@IF NOT EXIST "%ZipPath%" set "ZipPath=%ProgramFiles(x86)%\7-Zip\7z.exe"

"%ZipPath%" a -mx9 -mmt4 -m0=lzma:d27:fb128 -v2g "C:\foo.7z" "C:\LargeDirectory"

PAUSE

这会将存档分成两个 2GB。要提取存档：

@echo off

@set "ZipPath=%ProgramFiles%\7-Zip\7z.exe"
@IF NOT EXIST "%ZipPath%" set "ZipPath=%ProgramFiles(x86)%\7-Zip\7z.exe"

"%ZipPath%" x "-oC:\LargeDirectory" "C:\foo.7z.001"

PAUSE

您还可以参考7-zip 命令行以获取更多信息。希望能帮助到你。

c# - 如何以最大输出大小压缩多个文件，同时将文件对保持在一起

3 回答 3

Related

Reference