0

我正在将带注释的 PDF 页面从一个文档复制到另一个文档。我遇到的奇怪的事情是,在新文档中,我无法访问PdfPopupAnnotations 的父级:

public class CopyPdfTest {
    public static void main(String[] args) throws IOException {
        PdfDocument inputDoc = new PdfDocument(new PdfReader("src/test/resources/input.pdf"));
        PdfDocument outputDoc = new PdfDocument(new PdfWriter("/tmp/output.pdf"));

        // Copy pages
        for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
            inputDoc.copyPagesTo(i, i, outputDoc);
        }

        // Re-open outputDoc to eliminate the possibility the problem stems from
        // it being opened in writing mode
        outputDoc.close();
        outputDoc = new PdfDocument(new PdfReader("/tmp/output.pdf"));

        // Step through the PdfPopupAnnotations in both documents and check for their parents
        for (PdfDocument doc : new PdfDocument[] { inputDoc, outputDoc } ) {
            for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
                for (PdfAnnotation annot : doc.getPage(i).getAnnotations()) {
                    if (annot instanceof PdfPopupAnnotation) {
                        // This prints null for popups from the outputDoc
                        System.out.println(((PdfPopupAnnotation) annot).getParentObject());
                    }
                }
            }
        }
    }
}

这会在处理带有一个/Square注释的 PDF 时产生以下输出(第一行打印来自原始 PDF 的弹出注释父级,第二行打印null输出 PDF):

<</AP <</N 10 0 R >> /C [0.898026 0.133331 0.215683 ] /Contents test /CreationDate D:20180107105025+01'00' /F 4 /M D:20180107105029+01'00' /NM 8a233cc7-ed2f-48bf-91f2-a46cecf15160 /P 9 0 R /Popup 16 0 R /RC <?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:18.9.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:10.5pt;text-align:left;color:#000000;font-weight:normal;font-style:normal">test</span></p></body> /RD [0.5 0.5 0.5 0.5 ] /Rect [84.7495 636.205 191.876 764.21 ] /Subj Rectangle /Subtype /Square /T tom /Type /Annot >>
null

在查看未压缩的示例 PDF 时,我发现这特别奇怪,父引用4 0 R保持不变,并且引用的/Square注释仍然以4 0 obj.

输入.pdf

%PDF-1.4
%âãÏÓ
5 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
6 0 obj 
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S

endstream 
endobj 
4 0 obj 
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS 
<<
/W 2
/S /S
>>
/AP 
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj 
8 0 obj 
<<
/OPM 1
/Type /ExtGState
>>
endobj 
7 0 obj 
<<
/R7 8 0 R
>>
endobj 
9 0 obj 
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q

endstream 
endobj 
3 0 obj 
<<
/pdftk_PageNum 1
/Annots [4 0 R 5 0 R]
/Resources 
<<
/ProcSet [/PDF]
/ExtGState 7 0 R
>>
/Type /Page
/Parent 1 0 R
/Contents 9 0 R
/MediaBox [0 0 595 842]
>>
endobj 
1 0 obj 
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj 
11 0 obj 
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj 
12 0 obj 
<<
/ModDate (D:20180107101601+01'00')
/CreationDate (D:20180107101601+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 13
0000000000 65535 f 
0000001472 00000 n 
0000000000 65535 f 
0000001293 00000 n 
0000000460 00000 n 
0000000015 00000 n 
0000000220 00000 n 
0000001177 00000 n 
0000001130 00000 n 
0000001210 00000 n 
0000000000 65535 f 
0000001531 00000 n 
0000001583 00000 n 
trailer

<<
/Info 12 0 R
/ID [<23bde7d1ea6b4f52b55dc534b36f8d41><e031fe688c87cb2303e0a99487c3025e>]
/Root 11 0 R
/Size 13
>>
startxref
1779
%%EOF

输出.pdf

%PDF-1.7
%âãÏÓ
5 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
6 0 obj 
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S

endstream 
endobj 
4 0 obj 
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS 
<<
/W 2
/S /S
>>
/AP 
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj 
7 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
9 0 obj 
<<
/OPM 1
/Type /ExtGState
>>
endobj 
8 0 obj 
<<
/R7 9 0 R
>>
endobj 
10 0 obj 
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q

endstream 
endobj 
3 0 obj 
<<
/pdftk_PageNum 1
/Annots [4 0 R 7 0 R]
/Resources 
<<
/ProcSet [/PDF]
/ExtGState 8 0 R
>>
/Contents 10 0 R
/Parent 1 0 R
/Type /Page
/MediaBox [0 0 595 842]
>>
endobj 
1 0 obj 
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj 
12 0 obj 
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj 
13 0 obj 
<<
/ModDate (D:20180107101427+01'00')
/CreationDate (D:20180107101427+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 14
0000000000 65535 f 
0000001665 00000 n 
0000000000 65535 f 
0000001485 00000 n 
0000000460 00000 n 
0000000015 00000 n 
0000000220 00000 n 
0000001130 00000 n 
0000001368 00000 n 
0000001321 00000 n 
0000001401 00000 n 
0000000000 65535 f 
0000001724 00000 n 
0000001776 00000 n 
trailer

<<
/Info 13 0 R
/ID [<09baf689039bb6015d4c428111e4ee72><684b5613b1931e88255384276dcaceb1>]
/Root 12 0 R
/Size 14
>>
startxref
1972
%%EOF

关于为什么会这样以及如何使 iText 可以访问父级的任何提示?

4

1 回答 1

2

不幸的是,OP 没有以二进制形式提供 PDF 文件,所以我不能简单地检查以下内容;但是,查看数据,差异是显而易见的……

您的弹出对象input.pdf有一个Parent条目:

5 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj

output.pdf另一方面,在您的 popup 对象中不会:

7 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>

这也匹配该getParent方法的 iText 7 代码:

public PdfDictionary getParentObject() {
    return getPdfObject().getAsDictionary(PdfName.Parent);
}

public PdfAnnotation getParent() {
    if (parent == null) {
        parent = makeAnnotation(getParentObject());
    }
    return parent;
}

因此,要使 iText 可以访问父级,请确保弹出注释具有Parent条目!


是的,我知道,条目是可选的。但是getParent并没有声称它确定了实际的父对象,它只是返回了Parent条目所引用的对象。


您的 output.pdf 中的另一个问题:

  • 页面对象清楚地表明它的注释在对象 4 和 7 中;
  • 但是 4 中的注释引用了 5 中的注释作为弹出窗口,甚至与页面无关的注释;
  • 5 中的弹出窗口(与页面无关的)引用 4 作为其父项;7 中的弹出窗口(与页面关联的弹出窗口)没有条目。

在分析文件时,您可能没有查看页面的注释,而只是查看注释对象之间的弹出窗口/父关系,因此认为您的弹出窗口有父条目......

于 2018-01-07T22:12:16.330 回答