itextsharp - 尝试应用编辑会导致异常

Question

我已按照步骤创建注释并使用 iText 5.5.9 应用编辑。这是我的代码：

using (var stamper = new PdfStamper(pdfReader, new FileStream(newFilePath, FileMode.Create)))
{
    // Redact the values.
    var pdfAnot1 = new PdfAnnotation(stamper.Writer, new Rectangle(165f, 685f, 320f, 702f));
    pdfAnot1.Title = "First Page";
    pdfAnot1.Put(PdfName.SUBTYPE, PdfName.REDACT);
    pdfAnot1.Put(PdfName.IC, new PdfArray(new[] { 0f, 0f, 0f }));
    pdfAnot1.Put(PdfName.OC, new PdfArray(new[] { 1f, 0f, 0f })); // red outline
    stamper.AddAnnotation(pdfAnot1, 1);
    for (var i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        var pdfAnot2 = new PdfAnnotation(stamper.Writer, new Rectangle(220f, 752f, 420f, 768f));
        pdfAnot2.Title = "Header";
        pdfAnot2.Put(PdfName.SUBTYPE, PdfName.REDACT);
        pdfAnot2.Put(PdfName.IC, new PdfArray(new[] { 0f, 0f, 0f }));
        pdfAnot2.Put(PdfName.OC, new PdfArray(new[] { 1f, 0f, 0f })); // red outline
        stamper.AddAnnotation(pdfAnot2, i);
    }

    var cleaner = new PdfCleanUpProcessor(stamper);
    cleaner.CleanUp();
}

但是，我总是在 PdfCleanUpProcessor 构造中收到以下异常：

你调用的对象是空的。在 iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots() 在 iTextSharp.xtra.iTextSharp.text 的 iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots(Int32 页，PdfDictionary pageDict)。 pdf.pdfcleanup.PdfCleanUpProcessor..ctor(PdfStamper pdfStamper)

在 annotDict 的分配上，extractLocationsFromRedactAnnots 中似乎产生了一个空引用，因此下一行引发了异常：

    /**
     * Extracts locations from the redact annotations contained in the document and applied to the given page.
     */
    private IList<PdfCleanUpLocation> ExtractLocationsFromRedactAnnots(int page, PdfDictionary pageDict) {
        List<PdfCleanUpLocation> locations = new List<PdfCleanUpLocation>();

        if (pageDict.Contains(PdfName.ANNOTS)) {
            PdfArray annotsArray = pageDict.GetAsArray(PdfName.ANNOTS);

            for (int i = 0; i < annotsArray.Size; ++i) {
                PdfIndirectReference annotIndirRef = annotsArray.GetAsIndirectObject(i);
                PdfDictionary annotDict = annotsArray.GetAsDict(i);
                PdfName annotSubtype = annotDict.GetAsName(PdfName.SUBTYPE);

                if (annotSubtype.Equals(PdfName.REDACT)) {
                    SaveRedactAnnotIndirRef(page, annotIndirRef.ToString());
                    locations.AddRange(ExtractLocationsFromRedactAnnot(page, i, annotDict));
                }
            }
        }

        return locations;
    }

知道为什么会这样吗？一个示例 PDF 在这里。

score 1 · Accepted Answer

这里有两个问题，一个在 OP 的代码中，一个在 iText(Sharp) 中。

OP代码中的问题

必须意识到PdfReader/PdfStamper对的体系结构不是内存中的文档，它被操纵只是为了最终保存。相反，压模的操作通常会尽快写入输出流，并且对于在压模上工作的其他代码不一定可见。

基本原理是 iText 架构（在 7.x 之前的版本中可能看起来很狂野）是为了允许低资源占用的操作而构建的。在可能必须并行处理许多 PDF 的服务器应用程序中，这非常重要。

在手头的情况下，OP 的代码首先添加Redact注释，并在同一运行中尝试使用这些注释进行清理。这不起作用。相反，OP 应该一次性添加注释并在一秒钟内应用清理，即

using (PdfReader pdfReader = new PdfReader(source))
using (var stamper = new PdfStamper(pdfReader, new FileStream(temp, FileMode.Create)))
{
    // ... add REDACT annotations
}

using (PdfReader pdfReader = new PdfReader(temp))
using (var stamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create)))
{
    var cleaner = new PdfCleanUpProcessor(stamper);
    cleaner.CleanUp();
}

或者根本不使用Redact注释：毕竟，为什么添加注释只是为了立即再次删除它们。为此，它PdfCleanUpProcessor有第二个构造函数，直接给出清理位置：

/**
 * Creates a {@link com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor} object based on the
 * given {@link java.util.List} of {@link com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpLocation}s
 * representing regions to be erased from the document.
 *
 * @param pdfCleanUpLocations list of locations to be cleaned up {@see PdfCleanUpLocation}
 * @param pdfStamper          A{@link com.itextpdf.text.pdf.PdfStamper} object representing the document which redaction
 *                            applies to.
 */
public PdfCleanUpProcessor(IList<PdfCleanUpLocation> pdfCleanUpLocations, PdfStamper pdfStamper)

iText 中的问题（夏普）

PdfCleanUpProcessor有一个成员字典clippingRects，通过其页面 Annots 数组中的索引将编辑注释区域添加到该字典中：

private IList<PdfCleanUpLocation> ExtractLocationsFromRedactAnnot(int page, int annotIndex, PdfDictionary annotDict) {
    ...
    clippingRects.Add(annotIndex, markedRectangles); 
    ...
}

如果多个页面上的文档在各自的页面Annots数组中具有具有相同索引的Redact注释，则不同调用中的此方法会尝试使用相同的键向成员添加多个条目。.Net类不允许这样做并引发异常。clippingRectsDictionary

因此，如果只有一个页面被如此注释，则通过Redact注释的 iTextSharp 编辑仅适用于只有Redact注释的文档！

此功能的原始开发是在 Java 中进行的，在 JavaclippingRects中是HashMap允许覆盖条目的，因此这里不会引发异常。此外，由于的内容clippingRects仅在特殊情况下使用（在Redact条目中使用RO或OverlayText），错误的条目通常不会造成任何伤害，因此可能尚未可重现地观察到。

itextsharp - 尝试应用编辑会导致异常

1 回答 1

OP代码中的问题

iText 中的问题（夏普）

Related

Reference