Adobe Acrobat 能够编辑 PDF 文件(即实际删除信息,而不是简单地在其上绘制一个黑框)。我想以编程方式使用此功能。要使用 GUI 进行编校,请选择 Mark for Redaction 工具,将其绘制在要编校的文本上,然后应用编校。
有没有办法通过 AppleScript 或其他方式以编程方式执行此操作?
我知道要编辑的文本的 (X,y) 位置。
谢谢!
Adobe Acrobat 能够编辑 PDF 文件(即实际删除信息,而不是简单地在其上绘制一个黑框)。我想以编程方式使用此功能。要使用 GUI 进行编校,请选择 Mark for Redaction 工具,将其绘制在要编校的文本上,然后应用编校。
有没有办法通过 AppleScript 或其他方式以编程方式执行此操作?
我知道要编辑的文本的 (X,y) 位置。
谢谢!
In order to properly redact a PDF, you need to Alter The Content Stream. This is Very Hard.
If you can find the portion of the content stream that draws the text you want removed, you're halfway there.
The other half is figuring out how to change the content stream such that you don't modify the rest of the document. If the next text draw operator is proceeded by a "tm" command (set the text matrix, which absolutely positions the next piece of text), it's easy. If not... you have to calculate the exact width of the text you're replacing (several different PDF libraries can do this), and alter the drawing commands to skip over that much stuff.
For Example:
BT /F1 10 Tf 1 0 0 1 30 720 Tm (Here's some text, and you only want to REDACT that upper case "redact" over there)Tj * (This text is positioned relative to the previous line)Tj 1 0 0 1 30 650 Tm (This text is positioned absolutely, starting at 30, 650)Tj
So you'd have to break up that first (...)Tj
line into (Here's some text, and you only want to)Tj
, N 0 Td
, and (that upper case "redact" over there)Tj
... where the 'N' properly adjusts the position of the following text drawing operation such that it lands in EXACTLY THE SAME SPOT. So you'd need to know the precise width of " REDACT " using the font resource /F1 (whatever that turned out to be), sized to 10 points.
Just to make your life more exciting, you have to worry about kerned text too. You can provide little spacing adjustments inline with text thusly:
(This is taken from the first text drawn in the PDF Spec)
[(Adobe Sys)5(t)1(ems Inc)5(orporated)5( 20)5(08 \226 All rights)5( reser)-9(ved)]TJ
To properly redact "Incorporated", you need to determine that it's been split across two strings, and adjust the positioning of the string following it so it's in Exactly The Same Spot.
And strings can be <DEADBEEF>
hex values rather than (plain old ascii)
.
Get the idea? And I haven't covered all the possibilities here, just the most common ones.
Like I said: This is Very Hard.
There's an acrobat plugin called Appligent Redax (no connection) that lets you draw annotations (or generate them via templates, regex, etc) and then run their code to handle the redaction. It should be possible to programmatically create their annotations and perhaps even activate their plugin: JS in a document can run a menu item.
这是一个包含您需要做的事情的网页。正如其他人提到的,您必须在 Javascript 中执行此操作,因为这就是 Acrobat 的本机脚本。
http://acrobatusers.com/tutorials/2008/07/auto_redaction_with_javascript
虽然我经常使用 Acrobat,但令人惊讶的是,我从来不需要编写脚本。我查了字典,看起来你必须编写 Javascript 文件,保存它,然后用 Applescript 打开它,如果这是你想做的(比如作为服务)。
tell application "Adobe Acrobat Professional"
do script "this.info.title;"
end tell
这是 Adobe 的 Javascript for Acrobat 文档
您可以使用GroupDocs.Redaction for .NET以编程方式编辑 PDF 文档中的文本。您可以对文本执行准确的短语、区分大小写和正则表达式的编辑。这就是您可以执行确切的短语编辑的方式。
using (Document doc = Redactor.Load("D:\\candy.pdf"))
{
doc.RedactWith(new ExactPhraseRedaction("candy", new ReplacementOptions("[redacted]")));
// Save the document to "*_Redacted.*" file.
doc.Save(new SaveOptions() { AddSuffix = true, RasterizeToPDF = false });
}
披露:我在 GroupDocs 担任开发人员布道师。
在 Adobe Acrobat 中,您可以通过使用可以在许多不同事件上调用的 ActionScript 来做到这一点。
如果您想在一个单独的应用程序中执行此操作,那么在各种平台上都有许多不同的工具可以创建和操作 PDF 文档,尽管我还没有找到一个功能丰富的开源库,甚至可以接近一些这些产品。
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/default.aspx
http://www.aspose.com/categories/java-components/aspose.pdf-for-java/default.aspx
iText 是我个人的最爱,值得每一分钱。
一般来说,编辑 PDF 是一项相当复杂的任务。
您可以在 doXiview ( https://doxiview.cib.de ) 上免费编辑 PDF 页面。编辑选项位于右侧。
另一种方法由 CIB pdf 工具箱 ( https://pdftoolbox.cib.de/ )以编程方式完成