0

我正在使用以下代码 (Kotlin) 在 PDF 中查找超链接

    import org.apache.pdfbox.pdmodel.PDDocument
    import org.apache.pdfbox.pdmodel.interactive.action.PDActionURI
    import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink
    import ... destination.PDPageXYZDestination
    import java.io.File
    
    fun findAnnotationsTest() {
        val pdfPath = "LinkedPDF.pdf"
        val doc = PDDocument.load(File(pdfPath))
        var pageNo = 0
        for (page in doc.pages) {
            pageNo++
            for (annotation in page.annotations) {
                val subtype = annotation.subtype
                println("Found Annotation ($subtype) on page $pageNo")
                if (annotation is PDAnnotationLink) {
                    val aname = annotation.annotationName
                    println("\t\tfound Link  named $aname on page $pageNo")
                    val link = annotation
                    println("\t\tas string: " + link.toString());
                    println("\t\tdestination: " + link.getDestination());
                    val dest = link.destination
                    val destClass = dest::class
                    println("\t\tdest class is $destClass")
                    if(dest is PDPageXYZDestination){
                        val pageNumber = dest.pageNumber
                        println("\t\tdest page number is $pageNumber")
                    }
    
                    val action = link.action
    
                    if (action == null) {
                        println("\t\tbut action is null")
                        continue
                    }
                    if (action is PDActionURI)
                        println("\t\tURI action is ${action.uri}")
                    else
                        println("\t\tother action is ${action::class}")
                }
                else{
                    println("\tNOT a link")
                }
            }
        }
    }

输入文件有数百个(工作)内部链接。

此代码查找注释并将它们识别为链接,但 PDActions 和 PDPageXYZDestination 的页码 = -1 为空。每个链接的输出如下所示:

    Found Annotation (Link) on page 216
        found Link (Link) named null on page 216
        as string: org.apache.pdfbox....annotation.PDAnnotationLink@3234e239
        destination: org.apache.pdfbox.....destination.PDPageXYZDestination@3d921e20
        dest class is class org.apache.pdfbox...destination.PDPageXYZDestination
        dest page number is -1
        but action is null

顺便说一句,PDF 是通过将 MS Word 文档(具有指向 Word 书签的内部链接)保存为 PDF 来创建的。

关于我做错了什么的任何想法?

这是 PDF(示例):NBSample.pdf

4

1 回答 1

1

PDPageDestination 的目的地不是数字(这只是与外部页面链接),它是一个页面字典,因此需要额外的努力来获取数字(方法 javadoc 提到了这一点)。这里是PrintBookmarks.java示例的稍微修改的摘录:

if (dest instanceof PDPageDestination)
{
    PDPageDestination pd = (PDPageDestination) dest;
    System.out.println("Destination page: " + (pd.retrievePageNumber() + 1));
}
于 2021-10-28T17:18:34.217 回答