php - 如何获取文本形式的副本受保护的 pdf 文件或具有不同的字体？

Question

我正在使用 pdfparser 从 PDF 文件中复制文本，但某些 PDF 文件受到复制保护或具有不同的字体，因此 pdfparser 无法正常工作，是否可以从受复制保护的 PDF 中获取文本？

这是我的代码：

// Include Composer autoloader if not already done.
error_reporting(E_ALL);
ini_set('display_errors', 1);
include 'vendor/autoload.php';

// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile('tests.pdf');

// Retrieve all pages from the pdf file.
$pages  = $pdf->getPages();

// Loop over each page to extract text.
foreach ($pages as $page) {
    echo utf8_encode($page->getText());
}

?>

尝试此代码后，我没有收到任何错误或警告。此代码仅显示空格。我也尝试过 utf-8 编码，但它仍然无法正常工作？

score 0 · Accepted Answer

如果 PDF 的作者将文档的权限标志指定为不允许复制或提取文本和图形，那么您应该考虑这一点。然而，并非所有 PDF 软件都遵守这些限制。

php - 如何获取文本形式的副本受保护的 pdf 文件或具有不同的字体？

1 回答 1

Related

Reference