pdf - 如何使用 iTextSharp 从 PDF 中提取 RichMediaContent

Question

我需要提取嵌入在 pdf 文件中的视频文件。我可以找到注释中的视频，因此我无法单独保存。我需要保存这个文件我怎么做到这一点？

他提取了附件，就像我需要提取视频的方式一样。

这是我的代码：

 string FileName = AppDomain.CurrentDomain.BaseDirectory + "raven test.pdf";
    PdfReader pdfreader = new PdfReader(FileName);
    PdfDictionary PageDictionary = pdfreader.GetPageN(1);
    PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);       
    if ((Annots == null) || (Annots.Length == 0))
        return;

    foreach (PdfObject oAnnot in Annots.ArrayList)
    {
        PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(oAnnot);

        if (AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.RICHMEDIA))
        {
            if (AnnotationDictionary.Keys.Contains(PdfName.RICHMEDIACONTENT))
            {
                PdfDictionary oRICHContent = AnnotationDictionary.GetAsDict(PdfName.RICHMEDIACONTENT); // here i could see the video embeded but it is in annotation, how do i save this file?
            }
        }

    }

score 1 · Accepted Answer

对于这一点，您需要参考ISO 32000、BaseVersion 1.7、ExtensionLevel 3官方规范的 Adobe 补充。下面是基本代码，尽管您可能需要进行更多null检查。如有任何问题，请参阅评论。请注意，并非所有嵌入式电影都使用 RichMedia 格式，有些只是特殊附件，因此不会全部包含在内。

PdfReader pdfreader = new PdfReader(FileName);
PdfDictionary PageDictionary = pdfreader.GetPageN(1);
PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
if ((Annots == null) || (Annots.Length == 0))
    return;

foreach (PdfObject oAnnot in Annots.ArrayList) {
    PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(oAnnot);

    //See if the annotation is a rich media annotation
    if (AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.RICHMEDIA)) {
        //See if it has content
        if (AnnotationDictionary.Contains(PdfName.RICHMEDIACONTENT)) {
            //Get the content dictionary
            PdfDictionary RMC = AnnotationDictionary.GetAsDict(PdfName.RICHMEDIACONTENT);
            if (RMC.Contains(PdfName.ASSETS)) {
                //Get the assset sub dictionary if it exists
                PdfDictionary Assets = RMC.GetAsDict(PdfName.ASSETS);
                //Get the names sub array.
                PdfArray names = Assets.GetAsArray(PdfName.NAMES);
                //Make sure it has values
                if (names.ArrayList.Count > 0) {
                    //A single piece of content can have multiple assets. The array returned is in the form {name, IR, name, IR, name, IR...}
                    for (int i = 0; i < names.ArrayList.Count; i++) {
                        //Get the IndirectReference for the current asset
                        PdfIndirectReference ir = (PdfIndirectReference)names.ArrayList[++i];
                        //Get the true object from the main PDF
                        PdfDictionary obj = (PdfDictionary)PdfReader.GetPdfObject(ir);
                        //Get the sub Embedded File object
                        PdfDictionary ef = obj.GetAsDict(PdfName.EF);
                        //Get the filespec sub object
                        PdfIndirectReference fir = (PdfIndirectReference)ef.Get(PdfName.F);
                        //Get the true file stream of the filespec
                        PRStream objStream = (PRStream)PdfReader.GetPdfObject(fir);
                        //Get the raw bytes for the given object
                        byte[] bytes = PdfReader.GetStreamBytes(objStream);
                        //Do something with the bytes here
                    }
                }
            }
        }
    }
}

pdf - 如何使用 iTextSharp 从 PDF 中提取 RichMediaContent

1 回答 1

Related

Reference