25

我如何能够以编程方式搜索和替换大量 PDF 文件中的某些文本?我想删除已添加到一组文件的 URL。我已经能够在 Adob​​e Pro 的批处理下使用 javascript 删除链接,但链接文本仍然存在。我已经看到使用文本修饰的建议,它可以手动工作,但我不想手动修改 1300 个文件。

4

10 回答 10

17

由于文档格式的图形性质,在 PDF 中查找文本本来就很困难——您要搜索的字母在文件中可能不连续。也就是说,CAM::PDF具有一些搜索替换功能和启发式方法。试试 changepagestring.pl,看看它是否适用于您的 PDF。

安装:

 $ cpan install CAM::PDF
 # start a new terminal if this is your first cpan module
 $ changepagestring.pl input.pdf oldtext newtext output.pdf
于 2008-10-21T04:52:08.763 回答
7

我也变得绝望了。在安装了 10 次 PDF 编辑器后,都需要花钱,但没有成功:

pdftk + 编辑器就足够了:

替换 PDF 文件中的文本

  • 使用 pdftk 解压 PDF 页面流

    pdftk original.pdf 输出 original.uncompressed.pdf 解压

  • 替换 original.uncompressed.pdf 中的文本(有时有效,有时无效)

  • 修复修改过的(现在损坏的)PDF

    pdftk original.uncompressed.pdf 输出 original.uncompressed.fixed.pdf

(来自乔尔·戴尔)

于 2009-05-28T11:48:49.593 回答
1

You can use the 'redaction' feature in Adobe Acrobat Pro to find & replace all references in a single document in one step...not sure if it can be automated to multiple steps.

http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS5E28D332-9FF7-4569-AFAD-79AD60092D4D.w.html

于 2010-07-28T17:44:47.320 回答
1

I just finished trying out infix for a text that is comprised of text ladened with diacritics with the hope of generating another text where characters with double and composed diacritics are replaced by alternate with single diacritics. Infix is such definitely a good solution for someone who does not care for the trouble of understanding the working of programmatic solutions. All the request changes were effected. Still need to understand how to effect reflow of words that change the layout of text.

于 2011-11-12T13:54:37.860 回答
1

This is just half a solution but I used Touch up combined with AppleScript's support for sending keystrokes to replace a string in thousands of table cells. Depending on how your pages are layout it could work for you. In my case I had to manually insert the cursor in the beginning of every table (tens of tables - quite manageable for a manual process) but after that i replaced thousands of cells automatically.

于 2012-12-18T14:07:17.853 回答
1

The question is for a programmatic solution, but I will still share this free online tool which helped me mass replace text in some PDF files:

http://www.pdfdu.com/pdf-replace-text.aspx

I did not notice any ads or other modifications in the resulting PDF files after replacing the text.

I was not able to make the changes locally with the software I tried. I think the main problem was that I was missing the font used in the PDF and it did not work properly, even with Acrobat Pro. The online tool did not complain and produced a great result.

于 2015-01-14T22:26:56.293 回答
0

Not sure I would want to do all the work to write the code to modify your 1300 files when there is a program that can do it for you. The other day, I used the Professional version of Infix to batch modify almost 100 files using its "Find and Replace in Files" feature. It works great. I have evaluated other programs in hopes finding an find and replace functionality similar to Microsoft Word. Infix was the only one I found that can do it. Check out: http://www.iceni.com/infix-pro.htm

于 2011-01-07T04:13:13.100 回答
0

I suggest you may use VeryPDF PDF Text Replacer Command Line software to batch replace text in PDF pages, you can run pdftr.exe to replace text in PDF pages easily, for example,

pdftr.exe -contentreplace "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext "My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext "My Name=>D:\temp\myname.png*20*20" D:\in.pdf D:\out.pdf

pdftr.exe -pagerange 1-3 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchtext "string" C:\in.pdf

pdftr.exe -pagerange 1 -searchtext "string" C:\in.pdf

pdftr.exe -pagerange 1 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -overlaytextfontname "Arial" -overlaytextcolor FF0000 -overlaybgcolor 00FF00 -searchandoverlaytext "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -opw 123 -upw 456 -contentreplace "Old Text=>New Text||VeryPDF=>VeryDOC||My Name=>Your Name" D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 8 D:\in.pdf D:\out.pdf

pdftr.exe -searchandoverlaytext "PDFcamp Printer=>VeryPDF Printer" -overlaytextfontsize 80% D:\in.pdf D:\out.pdf

于 2017-07-14T00:30:17.137 回答
0

It appears that even with uncompressed pdf's, text is sometimes formatted funky. This makes "normal" text replacement, a la sed, not work or not be trivial.

I couldn't find anything that seemed to work with glyph spacing offsets, i.e. text that looks like this (which seems very common in pdf's), in this example, the word "Other information" is stored like this:

 [(O)-16(ther i)-20(nformati)-11(on )]TJ

I have attempted to write a tool that satisfies this myself. It works OK for common use cases. Check it out here.

First uncompress your pdf, then cd to the checked out git code and:

Syntax

 $ crystal replaceinpdf.cr input_filename.pdf "something you want replaced" "what you want it replaced with" output.pdf

Enjoy!

于 2021-06-11T06:16:20.507 回答
-1

Although it is quite an old thread. Just wanted to share a Node.js package option to search and replace text in PDF: Aspose.PDF Cloud SDK for Node.js. It is paid product but it provides 150 free monthly API calls.


const { PdfApi } = require("asposepdfcloud");
const { TextReplaceListRequest }= require("asposepdfcloud/src/models/textReplaceListRequest");
const { TextReplace }= require("asposepdfcloud/src/models/textReplace");

// Get Client ID and Client Secret from https://dashboard.aspose.cloud/
pdfApi = new PdfApi("xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx", "xxxxxxxxxxxxxxxxxxxxxx");
var fs = require('fs');

const name = "02_pages.pdf";
const remoteTempFolder = "Temp";
//const localTestDataFolder = "C:\\Temp";
//const path = remoteTempFolder + "\\" + name;
//const outputFile= "Replace_output.pdf";


// Upload File
//pdfApi.uploadFile(path, fs.readFileSync(localTestDataFolder + "\\" + name)).then((result) => {  
//                     console.log("Uploaded File");    
//                    }).catch(function(err) {
    // Deal with an error
//    console.log(err);
//});
    
const textReplace= new TextReplace();
        textReplace.oldValue= "origami"; 
        textReplace.newValue= "aspose";
        textReplace.regex= false;

const textReplace1= new TextReplace();
        textReplace1.oldValue= "candy"; 
        textReplace1.newValue= "biscuit";
        textReplace1.regex= false;
    
const trr = new TextReplaceListRequest();
            trr.textReplaces = [textReplace,textReplace1];


// Replace text
pdfApi.postDocumentTextReplace(name, trr, null, remoteTempFolder).then((result) => {    
    console.log(result.body.code);                  
}).catch(function(err) {
    // Deal with an error
    console.log(err);
});

//Download file
//const outputPath = "C:/Temp/" + outputFile;

//pdfApi.downloadFile(path).then((result) => {    
//  fs.writeFileSync(outputPath, result.body);
//    console.log("File Downloaded");    
//}).catch(function(err) {
    // Deal with an error
//    console.log(err);
//});
于 2021-05-07T17:36:22.190 回答