我发现我试图编辑的行实际上并不是 pdf 中的一组连续字符,而是它位于 PDF 中 BT 行中的 TJ 运算符内。我看不到任何处理所需文本在 CAM::PDF 库中的 TJ 行中的情况的任何规定(尽管可能有 @ChrisDolan ?)因此它无法被 CAM::PDF 操作或“换出” . 解压缩所有流(如果适用)后,我发现了这个“TJ”行,其中包含我希望操作的文本:
[(D)-20(a)24(t)62(e)-46(:)86( )-46(1)52(5)-37(.)70(0)-37(2)52(.)-20(2)52(0)-37(1)52(9)] TJ
我不相信 CAM::PDF 有可能作用于 TJ 线,也许它只能作用于 Tj 线
对于任何想快速解决同样问题的人,这个“脏”脚本在这种情况下对我有用:
#!/usr/bin/perl
use strict;
use Compress::Raw::Zlib;
use bytes;
open(OUT,'>', "newfromoldscript.pdf");
my $fname = 'Order fulfilment process flowchart.pdf';
open(FILE, '<:raw', $fname) || die("can't open($fname): $!");
$/ = undef;
my $file = <FILE>;
my $file_len = length($file);
my $i = 0;
my $offset;
my $offset;
my $o;
do {
$o = doX(substr($file, $offset, $file_len), $i);
$offset+=$o;
$i++;
} while($o && $i< 100);
sub doX {
my $file = shift;
my $i = shift;
my $stream = index($file, "\nstream");
if ($stream < 0) {
print OUT $file;
return 0;
}
$stream++;
my $deflate = 1;
my $line_before = rindex(substr($file,0,$stream), "<<");
print OUT substr($file,0,$line_before);
my $x = substr($file, $line_before,$stream-$line_before);
if ($i == 22) {
print "";
}
my $stream_len;
if ($x =~ /FlateDecode\/Length (\d+)>>/) {
$stream_len = $1;
}
if ($x =~ /FlateDecode\/Length (\d+)\//) {
print "Warn Object $i has len/len what the even is this?\n";
$stream_len = $1;
}
if ($x =~ /XML\/Length (\d+)>>/) {
$deflate = 0;
$stream_len = $1;
}
if (!$stream_len) {
die("I fail with no stream len : $x");
}
print "-->$line_before,$i,$stream=$stream_len=$x<--\n";
my $bytes = substr($file, $stream+8,$stream_len);
my $orig_bytes = $bytes; # inflate seems to mangle bytes, so take a copy
my $o;
my $d=new Compress::Raw::Zlib::Inflate();
if ($deflate) {
$d->inflate($bytes,$o);
} else {
$o = $bytes;
}
my $orig_x = $x;
my $changes;
my %change = (
'-20(2)52(0)-37(.)52(.)' => '-20(2)52(0)-37(2)52(0)', #trialling different reg ex's here
'-37(1)52(9)'=>'-37(2)52(0)', #reg ex's
'Date: 15.02.2019'=>'Date: 12.02.2020',
'[(A)[\d-]+(p)[\d-]+(p)[\d-]+(r)[\d-]+(o)[\d-]+(ve)[\d-]+(d)[\d-]+( )[\d-]+(B[^\]]+\] TJ'=>'(Approved By: George W) Tj??G-TAG??' #scrap the whole TJ, replace for Tj
);
foreach my $re (keys %change) {
my $to = $change{$re};
$re =~ s/([\(\)])/\\\1/g; # escape round brackets
print $re;
open (GW, ">tmp.gw");
print GW $re;
close (GW);
if ($o=~/$re/m) {
$o =~ s/$re/$to/mg;
print $o;
$changes++;
}
}
if ($changes) {
print "\n MADE CHANGES\n";
#split, get rid of the ? mark tag
my @remains = split('\?\?G-TAG\?\?', $o);
my $firsthalf = $remains[0];
my $secondhalf = $remains[1];
#reverse the string
$firsthalf = scalar reverse ($firsthalf);
if ($firsthalf =~ m/fT 52\.8 2F/){print "FOUND THE REVERSE"}
$firsthalf =~ s/fT 52\.8 2F/fT 52\.8 0F/;
#reg ex to back track to the nearest and thus relevant Font/F and set it to F0
#put it back in correct orientation
$firsthalf = scalar reverse ($firsthalf);
$o = join("", $firsthalf, $secondhalf);
open (WEIRD, ">tmp.weird");
print WEIRD $firsthalf;
close (WEIRD);
$changes++;
my $d = new Compress::Raw::Zlib::Deflate();
my $obytes;
my $obytes2;
my $status = $d->deflate($o, $obytes);
$d->flush($obytes2);
$bytes = $obytes . $obytes2;
if (length($bytes) != $stream_len) {
my $l = length($bytes);
print "-->$x<--\n";
warn("what do we do here $l != $stream_len");
$orig_x =~ s/$stream_len/$l/;
}
print OUT $orig_x . "stream\r\n";
print OUT $bytes . "\r";
} else {
print OUT $orig_x . "stream\r\n";
print OUT $orig_bytes . "\r";
}
open(TMP,">out/tmp.$i.bytes");
print TMP $o;
close(TMP);
return $stream + 8 + $stream_len + 1;
}
本质上,我将 TJ 换成 Tj 以将文档上其他人的姓名更改为我的姓名,这样可以更轻松地插入我的更改(但可能会很混乱)。为了使它能够以大写字母显示,我必须反转字符串并将它在 (F2) 下的字体 (F) 换成 F0
对于与日期相关的 TJ 行,我将 TJ 字符换成了我希望将其更改为的日期,这意味着我必须遵守 TJ 运算符行遵守的“不友好”语法