我用 Composer 安装了 PdfParser,它在我打开页面 cron.php 时工作。pdf被解析。
这是我在 cron.php 中的代码:
include 'vendor/autoload.php';
//include $_SERVER["DOCUMENT_ROOT"]. '/vendor/autoload.php';
//require 'vendor/autoload.php';
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile("$path/$fname");
$text = $pdf->getText();
$pdf = $parser->parseFile("vendor/smalot/pdfparser/samples/1.pdf");
$text = $pdf->getText();
echo $text;
exit();
我在 ubuntu 16 服务器中设置了一个 cron 以使用以下代码启动页面 cron.php:
* * * * * /usr/bin/php -q /var/www/html/..../public_html/post/cron.php >>/var/www/html/..../public_html/post/log/cron.php.log 2>&1
该页面有效,但日志告诉我:
Fatal error: Uncaught Error: Class 'Smalot\PdfParser\Parser' not found in /var/www/html/..../public_html/post/cron.php:161
Stack trace:
#0 /var/www/html/..../public_html/post/cron.php(62): getpart(Resource id #8, 451, Object(stdClass), 2)
#1 /var/www/html/..../public_html/post/cron.php(378): getmsg(Resource id #8, 451)
#2 {main}
thrown in /var/www/html/..../public_html/post/cron.php on line 161
这是我的 autoload.php
?php
/*
Using PDFParser without Composer
Folder structure
================
webroot
pdfdemos
INV001.pdf # test PDF file to extract text from for demo
test.php # our operational demo file
vendor
autoload.php
tecnickcom
tcpdf # unpack v6.2.12 from release at https://github.com/tecnickcom/TCPDF/archive/6.2.12.tar.gz
smalot
pdfparser # unpack from git master https://github.com/smalot/pdfparser/archive/master.zip release is 0.9.25 dated 2015-09-15
docs # optional
samples # optional
src
Smalot
PdfParser
*/
$vendorDir = 'vendor';
//$vendorDir = $_SERVER["DOCUMENT_ROOT"] . '/vendor';
$tcpdf_files = Array(
'Datamatrix' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/datamatrix.php',
'PDF417' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/pdf417.php',
'QRcode' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/qrcode.php',
'TCPDF' => $vendorDir . '/tecnickcom/tcpdf/tcpdf.php',
'TCPDF2DBarcode' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_barcodes_2d.php',
'TCPDFBarcode' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_barcodes_1d.php',
'TCPDF_COLORS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_colors.php',
'TCPDF_FILTERS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_filters.php',
'TCPDF_FONTS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_fonts.php',
'TCPDF_FONT_DATA' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_font_data.php',
'TCPDF_IMAGES' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_images.php',
'TCPDF_IMPORT' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_import.php',
'TCPDF_PARSER' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_parser.php',
'TCPDF_STATIC' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_static.php'
);
foreach ($tcpdf_files as $key => $file) {
include_once $file;
}
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Parser.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Document.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Header.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/PDFObject.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Page.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Pages.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementArray.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementBoolean.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementString.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementDate.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementHexa.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementMissing.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementName.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementNull.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementNumeric.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementStruct.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementXRef.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/StandardEncoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/ISOLatin1Encoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/ISOLatin9Encoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/MacRomanEncoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/WinAnsiEncoding.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontCIDFontType0.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontCIDFontType2.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontTrueType.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontType0.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontType1.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/XObject/Form.php";
include_once $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/XObject/Image.php";
这是我的“路径/文件”,其中日志说这是缺少的类 public_html/post/vendor/smalot/pdfparser/src/Smalot/PdfParser/Parser.php
<?php
/**
* @file
* This file is part of the PdfParser library.
*
* @author Sébastien MALOT <sebastien@malot.fr>
* @date 2017-01-03
* @license LGPLv3
* @url <https://github.com/smalot/pdfparser>
*
* PdfParser is a pdf library written in PHP, extraction oriented.
* Copyright (C) 2017 - Sébastien MALOT <sebastien@malot.fr>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this program.
* If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
*
*/
namespace Smalot\PdfParser;
use Smalot\PdfParser\Element\ElementArray;
use Smalot\PdfParser\Element\ElementBoolean;
use Smalot\PdfParser\Element\ElementDate;
use Smalot\PdfParser\Element\ElementHexa;
use Smalot\PdfParser\Element\ElementName;
use Smalot\PdfParser\Element\ElementNull;
use Smalot\PdfParser\Element\ElementNumeric;
use Smalot\PdfParser\Element\ElementString;
use Smalot\PdfParser\Element\ElementXRef;
/**
* Class Parser
*
* @package Smalot\PdfParser
*/
class Parser
{
/**
* @var PDFObject[]
*/
protected $objects = array();
/**
*
*/
public function __construct()
{
}
/**
* @param $filename
* @return Document
* @throws \Exception
*/
public function parseFile($filename)
{
$content = file_get_contents($filename);
/*
* 2018/06/20 @doganoo as multiple times a
* users have complained that the parseFile()
* method dies silently, it is an better option
* to remove the error control operator (@) and
* let the users know that the method throws an exception
* by adding @throws tag to PHPDoc.
*
* See here for an example: https://github.com/smalot/pdfparser/issues/204
*/
return $this->parseContent($content);
}
/**
* @param $content
* @return Document
* @throws \Exception
*/
public function parseContent($content)
{
// Create structure using TCPDF Parser.
ob_start();
@$parser = new \TCPDF_PARSER(ltrim($content));
list($xref, $data) = $parser->getParsedData();
unset($parser);
ob_end_clean();
if (isset($xref['trailer']['encrypt'])) {
throw new \Exception('Secured pdf file are currently not supported.');
}
if (empty($data)) {
throw new \Exception('Object list not found. Possible secured file.');
}
// Create destination object.
$document = new Document();
$this->objects = array();
foreach ($data as $id => $structure) {
$this->parseObject($id, $structure, $document);
unset($data[$id]);
}
$document->setTrailer($this->parseTrailer($xref['trailer'], $document));
$document->setObjects($this->objects);
return $document;
}
protected function parseTrailer($structure, $document)
{
$trailer = array();
foreach ($structure as $name => $values) {
$name = ucfirst($name);
if (is_numeric($values)) {
$trailer[$name] = new ElementNumeric($values, $document);
} elseif (is_array($values)) {
$value = $this->parseTrailer($values, null);
$trailer[$name] = new ElementArray($value, null);
} elseif (strpos($values, '_') !== false) {
$trailer[$name] = new ElementXRef($values, $document);
} else {
$trailer[$name] = $this->parseHeaderElement('(', $values, $document);
}
}
return new Header($trailer, $document);
}
/**
* @param string $id
* @param array $structure
* @param Document $document
*/
protected function parseObject($id, $structure, $document)
{
$header = new Header(array(), $document);
$content = '';
foreach ($structure as $position => $part) {
switch ($part[0]) {
case '[':
$elements = array();
foreach ($part[1] as $sub_element) {
$sub_type = $sub_element[0];
$sub_value = $sub_element[1];
$elements[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
}
$header = new Header($elements, $document);
break;
case '<<':
$header = $this->parseHeader($part[1], $document);
break;
case 'stream':
$content = isset($part[3][0]) ? $part[3][0] : $part[1];
if ($header->get('Type')->equals('ObjStm')) {
$match = array();
// Split xrefs and contents.
preg_match('/^((\d+\s+\d+\s*)*)(.*)$/s', $content, $match);
$content = $match[3];
// Extract xrefs.
$xrefs = preg_split(
'/(\d+\s+\d+\s*)/s',
$match[1],
-1,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
);
$table = array();
foreach ($xrefs as $xref) {
list($id, $position) = explode(' ', trim($xref));
$table[$position] = $id;
}
ksort($table);
$ids = array_values($table);
$positions = array_keys($table);
foreach ($positions as $index => $position) {
$id = $ids[$index] . '_0';
$next_position = isset($positions[$index + 1]) ? $positions[$index + 1] : strlen($content);
$sub_content = substr($content, $position, $next_position - $position);
$sub_header = Header::parse($sub_content, $document);
$object = PDFObject::factory($document, $sub_header, '');
$this->objects[$id] = $object;
}
// It is not necessary to store this content.
$content = '';
return;
}
break;
default:
if ($part != 'null') {
$element = $this->parseHeaderElement($part[0], $part[1], $document);
if ($element) {
$header = new Header(array($element), $document);
}
}
break;
}
}
if (!isset($this->objects[$id])) {
$this->objects[$id] = PDFObject::factory($document, $header, $content);
}
}
/**
* @param array $structure
* @param Document $document
*
* @return Header
* @throws \Exception
*/
protected function parseHeader($structure, $document)
{
$elements = array();
$count = count($structure);
for ($position = 0; $position < $count; $position += 2) {
$name = $structure[$position][1];
$type = $structure[$position + 1][0];
$value = $structure[$position + 1][1];
$elements[$name] = $this->parseHeaderElement($type, $value, $document);
}
return new Header($elements, $document);
}
/**
* @param $type
* @param $value
* @param $document
*
* @return Element|Header
* @throws \Exception
*/
protected function parseHeaderElement($type, $value, $document)
{
switch ($type) {
case '<<':
return $this->parseHeader($value, $document);
case 'numeric':
return new ElementNumeric($value, $document);
case 'boolean':
return new ElementBoolean($value, $document);
case 'null':
return new ElementNull($value, $document);
case '(':
if ($date = ElementDate::parse('(' . $value . ')', $document)) {
return $date;
} else {
return ElementString::parse('(' . $value . ')', $document);
}
case '<':
return $this->parseHeaderElement('(', ElementHexa::decode($value, $document), $document);
case '/':
return ElementName::parse('/' . $value, $document);
case 'ojbref': // old mistake in tcpdf parser
case 'objref':
return new ElementXRef($value, $document);
case '[':
$values = array();
foreach ($value as $sub_element) {
$sub_type = $sub_element[0];
$sub_value = $sub_element[1];
$values[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
}
return new ElementArray($values, $document);
case 'endstream':
case 'obj': //I don't know what it means but got my project fixed.
case '':
// Nothing to do with.
break;
default:
throw new \Exception('Invalid type: "' . $type . '".');
}
}
}
当我手动启动 cron.php 但不在 crontab 中时,它会解析 pdf 我被卡住了 4 天,我不知道问题出在哪里。请我需要你的建议。谢谢埃米尔。