c# - 从目录中按章节拆分 PDF

Question

我正在使用GemBox.Pdf，我需要将 PDF 文件中的各个章节提取为单独的 PDF 文件。

第一页（可能还有第二页）包含 TOC（目录），我需要根据它拆分其余的 PDF 页面：

包含章节和目录的 PDF 文件

此外，那些被拆分的 PDF 文档应该被命名为它们包含的章节。我可以根据每个文档的页数拆分 PDF（我用这个例子
弄清楚了）：

using (var source = PdfDocument.Load("Chapters.pdf"))
{
    int pagesPerSplit = 3;
    int count = source.Pages.Count;

    for (int index = 1; index < count; index += pagesPerSplit)
    {
        using (var destination = new PdfDocument())
        {
            for (int splitIndex = 0; splitIndex < pagesPerSplit; splitIndex++)
                destination.Pages.AddClone(source.Pages[index + splitIndex]);

            destination.Save("Chapter " + index + ".pdf");
        }
    }
}

但我不知道如何阅读和处理该目录并根据其项目合并章节拆分。

score 2 · Accepted Answer

您应该遍历文档的书签（大纲）并根据书签目标页面对其进行拆分。

例如，试试这个：

using (var source = PdfDocument.Load("Chapters.pdf"))
{
    PdfOutlineCollection outlines = source.Outlines;

    PdfPages pages = source.Pages;
    Dictionary<PdfPage, int> pageIndexes = pages
        .Select((page, index) => new { page, index })
        .ToDictionary(item => item.page, item => item.index);

    for (int index = 0, count = outlines.Count; index < count; ++index)
    {
        PdfOutline outline = outlines[index];
        PdfOutline nextOutline = index + 1 < count ? outlines[index + 1] : null;

        int pageStartIndex = pageIndexes[outline.Destination.Page];
        int pageEndIndex = nextOutline != null ?
            pageIndexes[nextOutline.Destination.Page] :
            pages.Count;

        using (var destination = new PdfDocument())
        {
            while (pageStartIndex < pageEndIndex)
            {
                destination.Pages.AddClone(pages[pageStartIndex]);
                ++pageStartIndex;
            }

            destination.Save($"{outline.Title}.pdf");
        }
    }
}

请注意，从屏幕截图中，您的章节书签似乎包括订单号（罗马数字）。如果需要，您可以使用以下内容轻松删除那些：

destination.Save($"{outline.Title.Substring(outline.Title.IndexOf(' ') + 1)}.pdf");

c# - 从目录中按章节拆分 PDF

1 回答 1

Related

Reference