0

我用 Composer 安装了 PdfParser,它在我打开页面 cron.php 时工作。pdf被解析。

这是我在 cron.php 中的代码:

include 'vendor/autoload.php';
//include  $_SERVER["DOCUMENT_ROOT"]. '/vendor/autoload.php';
//require 'vendor/autoload.php';
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile("$path/$fname");
$text   = $pdf->getText();
$pdf    = $parser->parseFile("vendor/smalot/pdfparser/samples/1.pdf");
$text   = $pdf->getText();
echo $text;
exit();

我在 ubuntu 16 服务器中设置了一个 cron 以使用以下代码启动页面 cron.php:

 * * * * * /usr/bin/php -q /var/www/html/..../public_html/post/cron.php >>/var/www/html/..../public_html/post/log/cron.php.log 2>&1

该页面有效,但日志告诉我:

Fatal error:  Uncaught Error: Class 'Smalot\PdfParser\Parser' not found in /var/www/html/..../public_html/post/cron.php:161
Stack trace:
#0 /var/www/html/..../public_html/post/cron.php(62): getpart(Resource id #8, 451, Object(stdClass), 2)
#1 /var/www/html/..../public_html/post/cron.php(378): getmsg(Resource id #8, 451)
#2 {main}
  thrown in /var/www/html/..../public_html/post/cron.php on line 161

这是我的 autoload.php

?php
/*
Using PDFParser without Composer
Folder structure
================
webroot
  pdfdemos
    INV001.pdf # test PDF file to extract text from for demo
    test.php # our operational demo file
  vendor
    autoload.php
    tecnickcom
      tcpdf # unpack v6.2.12 from release at https://github.com/tecnickcom/TCPDF/archive/6.2.12.tar.gz
    smalot
      pdfparser # unpack from git master https://github.com/smalot/pdfparser/archive/master.zip release is 0.9.25 dated 2015-09-15
        docs # optional
        samples # optional
        src
          Smalot
            PdfParser
*/

$vendorDir = 'vendor';
//$vendorDir = $_SERVER["DOCUMENT_ROOT"] . '/vendor';
$tcpdf_files = Array(
    'Datamatrix' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/datamatrix.php',
    'PDF417' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/pdf417.php',
    'QRcode' => $vendorDir . '/tecnickcom/tcpdf/include/barcodes/qrcode.php',
    'TCPDF' => $vendorDir . '/tecnickcom/tcpdf/tcpdf.php',
    'TCPDF2DBarcode' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_barcodes_2d.php',
    'TCPDFBarcode' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_barcodes_1d.php',
    'TCPDF_COLORS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_colors.php',
    'TCPDF_FILTERS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_filters.php',
    'TCPDF_FONTS' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_fonts.php',
    'TCPDF_FONT_DATA' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_font_data.php',
    'TCPDF_IMAGES' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_images.php',
    'TCPDF_IMPORT' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_import.php',
    'TCPDF_PARSER' => $vendorDir . '/tecnickcom/tcpdf/tcpdf_parser.php',
    'TCPDF_STATIC' => $vendorDir . '/tecnickcom/tcpdf/include/tcpdf_static.php'
);

foreach ($tcpdf_files as $key => $file) {
    include_once $file;
}

include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Parser.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Document.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Header.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/PDFObject.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Page.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Pages.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementArray.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementBoolean.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementString.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementDate.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementHexa.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementMissing.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementName.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementNull.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementNumeric.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementStruct.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Element/ElementXRef.php";

include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/StandardEncoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/ISOLatin1Encoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/ISOLatin9Encoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/MacRomanEncoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Encoding/WinAnsiEncoding.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontCIDFontType0.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontCIDFontType2.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontTrueType.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontType0.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/Font/FontType1.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/XObject/Form.php";
include_once  $vendorDir . "/smalot/pdfparser/src/Smalot/PdfParser/XObject/Image.php";

这是我的“路径/文件”,其中日志说这是缺少的类 public_html/post/vendor/smalot/pdfparser/src/Smalot/PdfParser/Parser.php

<?php

/**
 * @file
 *          This file is part of the PdfParser library.
 *
 * @author  Sébastien MALOT <sebastien@malot.fr>
 * @date    2017-01-03
 * @license LGPLv3
 * @url     <https://github.com/smalot/pdfparser>
 *
 *  PdfParser is a pdf library written in PHP, extraction oriented.
 *  Copyright (C) 2017 - Sébastien MALOT <sebastien@malot.fr>
 *
 *  This program is free software: you can redistribute it and/or modify
 *  it under the terms of the GNU Lesser General Public License as published by
 *  the Free Software Foundation, either version 3 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU Lesser General Public License for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program.
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
 *
 */

namespace Smalot\PdfParser;

use Smalot\PdfParser\Element\ElementArray;
use Smalot\PdfParser\Element\ElementBoolean;
use Smalot\PdfParser\Element\ElementDate;
use Smalot\PdfParser\Element\ElementHexa;
use Smalot\PdfParser\Element\ElementName;
use Smalot\PdfParser\Element\ElementNull;
use Smalot\PdfParser\Element\ElementNumeric;
use Smalot\PdfParser\Element\ElementString;
use Smalot\PdfParser\Element\ElementXRef;

/**
 * Class Parser
 *
 * @package Smalot\PdfParser
 */
class Parser
{
    /**
     * @var PDFObject[]
     */
    protected $objects = array();

    /**
     *
     */
    public function __construct()
    {

    }

    /**
     * @param $filename
     * @return Document
     * @throws \Exception
     */
    public function parseFile($filename)
    {
        $content = file_get_contents($filename);
        /*
         * 2018/06/20 @doganoo as multiple times a
         * users have complained that the parseFile()
         * method dies silently, it is an better option
         * to remove the error control operator (@) and
         * let the users know that the method throws an exception
         * by adding @throws tag to PHPDoc.
         *
         * See here for an example: https://github.com/smalot/pdfparser/issues/204
         */
        return $this->parseContent($content);
    }

    /**
     * @param $content
     * @return Document
     * @throws \Exception
     */
    public function parseContent($content)
    {
        // Create structure using TCPDF Parser.
        ob_start();
        @$parser = new \TCPDF_PARSER(ltrim($content));
        list($xref, $data) = $parser->getParsedData();
        unset($parser);
        ob_end_clean();

        if (isset($xref['trailer']['encrypt'])) {
            throw new \Exception('Secured pdf file are currently not supported.');
        }

        if (empty($data)) {
            throw new \Exception('Object list not found. Possible secured file.');
        }

        // Create destination object.
        $document      = new Document();
        $this->objects = array();

        foreach ($data as $id => $structure) {
            $this->parseObject($id, $structure, $document);
            unset($data[$id]);
        }

        $document->setTrailer($this->parseTrailer($xref['trailer'], $document));
        $document->setObjects($this->objects);

        return $document;
    }

    protected function parseTrailer($structure, $document)
    {
        $trailer = array();

        foreach ($structure as $name => $values) {
            $name = ucfirst($name);

            if (is_numeric($values)) {
                $trailer[$name] = new ElementNumeric($values, $document);
            } elseif (is_array($values)) {
                $value          = $this->parseTrailer($values, null);
                $trailer[$name] = new ElementArray($value, null);
            } elseif (strpos($values, '_') !== false) {
                $trailer[$name] = new ElementXRef($values, $document);
            } else {
                $trailer[$name] = $this->parseHeaderElement('(', $values, $document);
            }
        }

        return new Header($trailer, $document);
    }

    /**
     * @param string   $id
     * @param array    $structure
     * @param Document $document
     */
    protected function parseObject($id, $structure, $document)
    {
        $header  = new Header(array(), $document);
        $content = '';

        foreach ($structure as $position => $part) {
            switch ($part[0]) {
                case '[':
                    $elements = array();

                    foreach ($part[1] as $sub_element) {
                        $sub_type   = $sub_element[0];
                        $sub_value  = $sub_element[1];
                        $elements[] = $this->parseHeaderElement($sub_type, $sub_value, $document);
                    }

                    $header = new Header($elements, $document);
                    break;

                case '<<':
                    $header = $this->parseHeader($part[1], $document);
                    break;

                case 'stream':
                    $content = isset($part[3][0]) ? $part[3][0] : $part[1];

                    if ($header->get('Type')->equals('ObjStm')) {
                        $match = array();

                        // Split xrefs and contents.
                        preg_match('/^((\d+\s+\d+\s*)*)(.*)$/s', $content, $match);
                        $content = $match[3];

                        // Extract xrefs.
                        $xrefs = preg_split(
                            '/(\d+\s+\d+\s*)/s',
                            $match[1],
                            -1,
                          PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
                        );
                        $table = array();

                        foreach ($xrefs as $xref) {
                            list($id, $position) = explode(' ', trim($xref));
                            $table[$position] = $id;
                        }

                        ksort($table);

                        $ids       = array_values($table);
                        $positions = array_keys($table);

                        foreach ($positions as $index => $position) {
                            $id            = $ids[$index] . '_0';
                            $next_position = isset($positions[$index + 1]) ? $positions[$index + 1] : strlen($content);
                            $sub_content   = substr($content, $position, $next_position - $position);

                            $sub_header         = Header::parse($sub_content, $document);
                            $object             = PDFObject::factory($document, $sub_header, '');
                            $this->objects[$id] = $object;
                        }

                        // It is not necessary to store this content.
                        $content = '';

                        return;
                    }
                    break;

                default:
                    if ($part != 'null') {
                        $element = $this->parseHeaderElement($part[0], $part[1], $document);

                        if ($element) {
                            $header = new Header(array($element), $document);
                        }
                    }
                    break;

            }
        }

        if (!isset($this->objects[$id])) {
            $this->objects[$id] = PDFObject::factory($document, $header, $content);
        }
    }

    /**
     * @param array    $structure
     * @param Document $document
     *
     * @return Header
     * @throws \Exception
     */
    protected function parseHeader($structure, $document)
    {
        $elements = array();
        $count    = count($structure);

        for ($position = 0; $position < $count; $position += 2) {
            $name  = $structure[$position][1];
            $type  = $structure[$position + 1][0];
            $value = $structure[$position + 1][1];

            $elements[$name] = $this->parseHeaderElement($type, $value, $document);
        }

        return new Header($elements, $document);
    }

    /**
     * @param $type
     * @param $value
     * @param $document
     *
     * @return Element|Header
     * @throws \Exception
     */
    protected function parseHeaderElement($type, $value, $document)
    {
        switch ($type) {
            case '<<':
                return $this->parseHeader($value, $document);

            case 'numeric':
                return new ElementNumeric($value, $document);

            case 'boolean':
                return new ElementBoolean($value, $document);

            case 'null':
                return new ElementNull($value, $document);

            case '(':
                if ($date = ElementDate::parse('(' . $value . ')', $document)) {
                    return $date;
                } else {
                    return ElementString::parse('(' . $value . ')', $document);
                }

            case '<':
                return $this->parseHeaderElement('(', ElementHexa::decode($value, $document), $document);

            case '/':
                return ElementName::parse('/' . $value, $document);

            case 'ojbref': // old mistake in tcpdf parser
            case 'objref':
                return new ElementXRef($value, $document);

            case '[':
                $values = array();

                foreach ($value as $sub_element) {
                    $sub_type  = $sub_element[0];
                    $sub_value = $sub_element[1];
                    $values[]  = $this->parseHeaderElement($sub_type, $sub_value, $document);
                }

                return new ElementArray($values, $document);

            case 'endstream':
            case 'obj': //I don't know what it means but got my project fixed.
            case '':
                // Nothing to do with.
                break;

            default:
                throw new \Exception('Invalid type: "' . $type . '".');
        }
    }
}

当我手动启动 cron.php 但不在 crontab 中时,它会解析 pdf 我被卡住了 4 天,我不知道问题出在哪里。请我需要你的建议。谢谢埃米尔。

4

1 回答 1

0

好的,我找到了一种方法来让它工作:

我在 cron.php 中插入这个命令: echo "getcwd=" . getcwd(); 并且我观察到当前目录是错误的,所以我将 crontab 移动到根目录以使当前目录成为根目录,然后我调整路径以适应。感谢杰托的支持。

于 2020-01-03T22:32:35.757 回答