pdf - 如何以编程方式将 pdf 注释（例如用矩形包围的公式）导出为图像？

Question

我想知道是否可以将一些注释导出为图像。我已经知道如何将突出显示的文本导出为文本，但这不适用于方程式。如果方程式由注释表示，例如围绕它们的框，我可以使用 pdf 快照工具将它们一次全部转换为图像吗？

使用 pdf 快照工具可以轻松地手动完成每一项操作。任何 pdf 库或程序是否有任何工具可以让您以编程方式制作图像快照，而不是整个页面，而是以某种方式用注释标记的单个方程？

就问题而言，它们不一定是免费程序。谢谢。

score 2 · Accepted Answer

我在这里提出了一个完整的基于 ruby 的解决方案，使用 ruby gems pdf-reader 和 rmagick（以及安装 imagemagick）。

require 'pdf-reader'
require 'RMagick'

pdf_file_name='statmech' #without extension
doc = PDF::Reader.new(File.expand_path(pdf_file_name+".pdf"))
$objects = doc.objects

def convertpagetojpgandcrop(filename,pagenum,croprect,imgname)
   pagename = filename+".pdf[#{pagenum-1}]"
   #higher density used for quality purposes (otherwise fuzzy)
   pageim = Magick::Image.read(pagename){ |opts| opts.density = 216}.first
   #factors of 3 needed because higher density TODO: generalize to pdf density!=72
   #SouthWestGravity puts coordinate origin in bottom left to match pdf coords
   eqim =pageim.crop(Magick::SouthWestGravity,...    
   3*croprect[0],3*croprect[1],3*croprect[2]-3*croprect[0],3*croprect[3]-3*croprect[1])
   eqim.write(imgname)
end

def is_square?(object)
   object[:Type] == :Annot && object[:Subtype] == :Square
end
def is_highlight?(object)
   object[:Type] == :Annot && object[:Subtype] == :Highlight
end

def annots_on_page(page)
   references = (page.attributes[:Annots] || [])
   lookup_all(references).flatten
end

def lookup_all(refs)
   refs = *refs
   refs.map { |ref| lookup(ref) }
end

def lookup(ref)
   object = $objects[ref]
   return object unless object.is_a?(Array)
   lookup_all(object)
end

def highlights_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_highlight?(a) }
end

def squares_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_square?(a) }
end
def restricted_annots_on_page(page)
   all_annots = annots_on_page(page)
   all_annots.select { |a| is_square?(a)||is_highlight?(a) }
 end
#This block exports a jpg for each 'square' annotation in pdf
doc.pages.each do |page|
   eqnum=0
   all_squares = squares_on_page(page)
   all_squares.each do |annot|
  eqnum = eqnum+1
  puts "#{annot[:Rect]}"
  convertpagetojpgandcrop(pdf_file_name,page.number,annot[:Rect],...
      pdf_file_name+"page#{page.number}eq#{eqnum}.jpg")
   end
 end    

 #This block gives the text of the highlights and wikilinks to the images 
 #TODO:(needs to go in text file)
doc.pages.each do |page|
  eqnum = 0
  annots = restricted_annots_on_page(page)
  if annots.length>0
   puts "# Page #{page.number}"
  end
  annots.each do |annot|
if is_square?(annot)
   eqnum = eqnum+1
   puts "{{wiki:#{pdf_file_name}page#{page.number}eq#{eqnum}.jpg}}"
else
       puts "#{annot[:Contents]}"
end
  end
end

此代码扩展了在线找到的 pdf-reader 和 rmagick gem 的示例代码。几行是原创的。

score 1 · Accepted Answer

此代码示例使用Amyuni PDF Creator .Net，它将一次仅导出一个可见注释的页面：

using System.IO;
using Amyuni.PDFCreator;
using System.Collections;
//open a pdf document
FileStream testfile = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
IacDocument document = new IacDocument(null);
document.SetLicenseKey("your license", "your code");
document.Open(testfile, "");

document.CurrentPageNumber = 1;
IacAttribute attribute = document.CurrentPage.AttributeByName("Objects");

// listobj is an array list of objects
ArrayList listobj = (System.Collections.ArrayList)attribute.Value;
ArrayList annotations = new ArrayList();
foreach (Amyuni.PDFCreator.IacObject iacObj in listobj)
{
    if ((bool)iacObj.AttributeByName("Annotation").Value)
    {
        annotations.Add(iacObj);
        // Put the annotation out of sight
        iacObj.Coordinates = Rectangle.FromLTRB(
                            -iacObj.Coordinates.Left,
                            -iacObj.Coordinates.Top,
                            -iacObj.Coordinates.Right,
                            -iacObj.Coordinates.Bottom);
    }
    else
        iacObj.Delete(false);
}

ArrayList images = new ArrayList();
int i = 0;
foreach (Amyuni.PDFCreator.IacObject iacObj in annotations)
{
    // Back on sight
    iacObj.Coordinates = Rectangle.FromLTRB(
                        -iacObj.Coordinates.Left,
                        -iacObj.Coordinates.Top,
                        -iacObj.Coordinates.Right,
                        -iacObj.Coordinates.Bottom);
    //Draw the page
    Bitmap bmp = new Bitmap(1000, 1000);
    Graphics gr = Graphics.FromImage(bmp);
    IntPtr hdc = gr.GetHdc();
    document.DrawCurrentPage(hdc.ToInt32(), true);
    gr.ReleaseHdc();
    images.Add(bmp);
    bmp.Save("c:\\temp\\image" + i + ".pdf");

    iacObj.Delete(false); // object not needed anymore
    i++;
}

如果需要，您可以使用注释对象的坐标属性提取结果图像中与注释对应的部分。

如果要从矩形区域（注释或其他）中提取所有对象，可以通过调用IacDocument.GetObjectsInRectangle方法来替换收集注释的循环

通常的免责声明适用

pdf - 如何以编程方式将 pdf 注释（例如用矩形包围的公式）导出为图像？

2 回答 2

Related

Reference