5

如何使用 PDFSharp 从 PDF 文档中提取 FlateDecoded(如 PNG)的图像?

我在 PDFSharp 的示例中发现了该评论:

// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.

有没有人有这个问题的解决方案?

感谢您的回复。

编辑:因为我无法在 8 小时内回答我自己的问题,所以我这样做:

感谢您的快速回复。

我在方法“ExportAsPngImage”中添加了一些代码,但没有得到想要的结果。它只是提取了更多图像(png)并且它们没有正确的颜色并且被扭曲。

这是我的实际代码:

PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
        byte[] decodedBytes = flate.Decode(bytes);

        System.Drawing.Imaging.PixelFormat pixelFormat;

        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = PixelFormat.Format8bppIndexed;
                break;
            case 24:
                pixelFormat = PixelFormat.Format24bppRgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
        int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        using (FileStream fs = new FileStream(@"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
        {
            bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
        }

那是正确的方法吗?还是我应该选择其他方式?非常感谢!

4

6 回答 6

4

我知道这个答案可能会晚几年,但也许会对其他人有所帮助。

在我的情况下发生了失真,因为image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)似乎没有返回正确的值。正如Vive la déraison在您的问题下指出的那样,您获得了用于使用Marshal.Copy. 因此,在执行后反转字节并旋转位图Marshal.Copy就可以了。

生成的代码如下所示:

private static void ExportAsPngImage(PdfDictionary image, ref int count)
    {
        int width = image.Elements.GetInteger(PdfImage.Keys.Width);
        int height = image.Elements.GetInteger(PdfImage.Keys.Height);

        var canUnfilter = image.Stream.TryUnfilter();
        byte[] decodedBytes;

        if (canUnfilter)
        {
            decodedBytes = image.Stream.Value;
        }
        else
        {
            PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
            decodedBytes = flate.Decode(image.Stream.Value);
        }

        int bitsPerComponent = 0;
        while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)
        {
            bitsPerComponent++;
        }

        System.Drawing.Imaging.PixelFormat pixelFormat;
        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                break;
            case 16:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;
                break;
            case 24:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                break;
            case 32:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
                break;
            case 64:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        decodedBytes = decodedBytes.Reverse().ToArray();

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
        int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);
        bmp.Save(String.Format("exported_Images\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);
    }

该代码可能需要一些优化,但在我的情况下它确实正确导出了 FlateDecoded 图像。

于 2019-08-01T07:21:20.953 回答
1

要获得 Windows BMP,您只需创建一个位图标题,然后将图像数据复制到位图中。PDF 图像是字节对齐的(每个新行都从一个字节边界开始),而 Windows BMP 是 DWORD 对齐的(每个新行都从一个 DWORD 边界开始(由于历史原因,一个 DWORD 是 4 个字节))。位图标题所需的所有信息都可以在过滤器参数中找到或可以计算出来。

调色板是 PDF 中的另一个 FlateEncoded 对象。您还可以将其复制到 BMP 中。

这必须针对多种格式(每像素 1 位、8 bpp、24 bpp、32 bpp)完成。

于 2012-04-05T08:33:00.633 回答
1

这是我执行此操作的完整代码。

我正在从 PDF 中提取 UPS 运输标签,因此我提前知道格式。如果您提取的图像是未知类型,那么您需要检查bitsPerComponent并相应地处理它。我也只处理第一页上的第一张图片。

注意:我正在使用TryUnfilter“放气”,它使用应用的任何过滤器并为我就地解码数据。无需明确调用“Deflate”。

    var file = @"c:\temp\PackageLabels.pdf";

    var doc = PdfReader.Open(file);
    var page = doc.Pages[0];

    {
        // Get resources dictionary
        PdfDictionary resources = page.Elements.GetDictionary("/Resources");
        if (resources != null)
        {
            // Get external objects dictionary
            PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
            if (xObjects != null)
            {
                ICollection<PdfItem> items = xObjects.Elements.Values;

                // Iterate references to external objects
                foreach (PdfItem item in items)
                {
                    PdfReference reference = item as PdfReference;
                    if (reference != null)
                    {
                        PdfDictionary xObject = reference.Value as PdfDictionary;
                        // Is external object an image?
                        if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                        {
                            // do something with your image here 
                            // only the first image is handled here
                            var bitmap = ExportImage(xObject);
                            bmp.Save(@"c:\temp\exported.png", System.Drawing.Imaging.ImageFormat.Bmp);
                        }
                    }
                }
            }
        }
    }

使用这些辅助函数

    private static Bitmap ExportImage(PdfDictionary image)
    {
        string filter = image.Elements.GetName("/Filter");
        switch (filter)
        {
            case "/FlateDecode":
                return ExportAsPngImage(image);

            default:
                throw new ApplicationException(filter + " filter not implemented");
        }
    }

    private static Bitmap ExportAsPngImage(PdfDictionary image)
    {
        int width = image.Elements.GetInteger(PdfImage.Keys.Width);
        int height = image.Elements.GetInteger(PdfImage.Keys.Height);
        int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);   

        var canUnfilter = image.Stream.TryUnfilter();
        var decoded = image.Stream.Value;

        Bitmap bmp = new Bitmap(width, height, System.Drawing.Imaging.PixelFormat.Format8bppIndexed);
        BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
        Marshal.Copy(decoded, 0, bmpData.Scan0, decoded.Length);
        bmp.UnlockBits(bmpData);

        return bmp;
    }
于 2017-08-09T05:06:48.490 回答
1

到目前为止......我的代码......它适用于许多png文件,但不适用于来自 adobe photoshop 的带有色彩空间索引的文件:

    private bool ExportAsPngImage(PdfDictionary image, string SaveAsName)
        {
            int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width);
            int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height);
            int bitsPerComponent = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.BitsPerComponent);
            var ColorSpace = image.Elements.GetArray(PdfImage.Keys.ColorSpace);
System.Drawing.Imaging.PixelFormat pixelFormat= System.Drawing.Imaging.PixelFormat.Format24bppRgb; //24 just for initalize

            if (ColorSpace is null) //no colorspace.. bufferedimage?? is in BGR order instead of RGB so change the byte order. Right now it works
            {
                byte[] origineel_byte_boundary = image.Stream.UnfilteredValue;
                bitsPerComponent = (origineel_byte_boundary.Length) / (width * height);
                switch (bitsPerComponent)
                {
                    case 4:
                        pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppPArgb;
                        break;
                    case 3:
                        pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                        break;
                    default:
                        {
                            MessageBox.Show("Unknown pixel format " + bitsPerComponent, "Error", MessageBoxButtons.OK, MessageBoxIcon.Warning);
                            return false;
                        }
                        break;
                }
                Bitmap bmp = new Bitmap(width, height, pixelFormat); //copy raw bytes to "master" bitmap so we are out of pdf format to work with 
                System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, width, height), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat);
                System.Runtime.InteropServices.Marshal.Copy(origineel_byte_boundary, 0, bmd.Scan0, origineel_byte_boundary.Length);
                bmp.UnlockBits(bmd);
                Bitmap bmp2 = new Bitmap(width, height, pixelFormat);
                for (int indicex = 0; indicex < bmp.Width; indicex++)
                {
                    for (int indicey = 0; indicey < bmp.Height; indicey++)
                    {
                        Color nuevocolor = bmp.GetPixel(indicex, indicey);
                        Color colorintercambiado = Color.FromArgb(nuevocolor.A, nuevocolor.B, nuevocolor.G, nuevocolor.R);
                        bmp2.SetPixel(indicex, indicey, colorintercambiado);
                    }
                }
                using (FileStream fs = new FileStream(SaveAsName, FileMode.Create, FileAccess.Write))
                {
                    bmp2.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
                }
                bmp2.Dispose();
                bmp.Dispose();
            }
            else
            {
// this is the case of photoshop... work needs to be done here. I ´m able to get the color palette but no idea how to put it back or create the png file... 
                switch (bitsPerComponent)
                {
                    case 4:
                        pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
                        break;
                    default:
                        {
                            MessageBox.Show("Unknown pixel format " + bitsPerComponent, "Error", MessageBoxButtons.OK, MessageBoxIcon.Warning);
                            return false;
                        }
                        break;
                }
                if ((ColorSpace.Elements.GetName(0) == "/Indexed") && (ColorSpace.Elements.GetName(1) == "/DeviceRGB"))
                {
                    //we need to create the palette
                    int paletteColorCount = ColorSpace.Elements.GetInteger(2);
                    List<System.Drawing.Color> paletteList = new List<Color>();
                    //Color[] palette = new Color[paletteColorCount+1]; // no idea why but it seams that there´s always 1 color more. ¿transparency?
                    PdfObject paletteObj = ColorSpace.Elements.GetObject(3);
                    PdfDictionary paletteReference = (PdfDictionary)paletteObj;
                    byte[] palettevalues = paletteReference.Stream.Value;
                    for (int index = 0; index < (paletteColorCount + 1); index++)
                    {
                        //palette[index] = Color.FromArgb(1, palettevalues[(index*3)], palettevalues[(index*3)+1], palettevalues[(index*3)+2]); // RGB
                        paletteList.Add(Color.FromArgb(1, palettevalues[(index * 3)], palettevalues[(index * 3) + 1], palettevalues[(index * 3) + 2])); // RGB
                    }                  
                }
            }
            return true;
        }
于 2021-11-12T11:53:40.923 回答
0

PDF 可能包含带有遮罩和不同色彩空间选项的图像,这就是为什么在某些情况下简单地解码图像对象可能无法正常工作的原因。

因此代码还需要检查 PDF 中的图像掩码 (/ImageMask) 和图像对象的其他属性(以查看图像是否也应该使用反转颜色或使用索引颜色)以重新创建类似于在 PDF 中显示的图像。请参阅官方PDF 参考中的图像对象、/ImageMask 和 /Decode 字典。

不确定 PDFSharp 是否能够在 PDF 中找到图像蒙版对象,但iTextSharp能够访问图像蒙版对象(请参阅 PdfName.MASK 对象类型)。

PDF Extractor SDK等商业工具能够以原始形式和“渲染”形式提取图像。

我为 ByteScout 工作,它是 PDF Extractor SDK 的制造商

于 2015-04-13T10:28:50.600 回答
-1

Maybe not directly answer the question but another option to extract images from PDF is to use FreeSpire.PDF which can extract the image from pdf easily. It is available as Nuget package https://www.nuget.org/packages/FreeSpire.PDF/. They handle all the image format and can export as PNG. Their sample code is

using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using Spire.Pdf;

namespace ExtractImagesFromPDF
{
    class Program
    {
        static void Main(string[] args)
        {
            //Instantiate an object of Spire.Pdf.PdfDocument
            PdfDocument doc = new PdfDocument();
            //Load a PDF file 
            doc.LoadFromFile("sample.pdf");
            List<Image> ListImage = new List<Image>();
            for (int i = 0; i < doc.Pages.Count; i++)
            {
                // Get an object of Spire.Pdf.PdfPageBase
                PdfPageBase page = doc.Pages[i];
                // Extract images from Spire.Pdf.PdfPageBase
                Image[] images = page.ExtractImages();
                if (images != null && images.Length > 0)
                {
                    ListImage.AddRange(images);
                }

            }
            if (ListImage.Count > 0)
            {
                for (int i = 0; i < ListImage.Count; i++)
                {
                    Image image = ListImage[i];
                    image.Save("image" + (i + 1).ToString() + ".png", System.Drawing.Imaging.ImageFormat.Png);
                }
                System.Diagnostics.Process.Start("image1.png");
            }  
        }
    }
}

(code taken from https://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/How-to-Extract-Image-From-PDF-in-C.html)

于 2018-02-07T04:50:27.123 回答