php - 如何使用 PHPExcel 从大型 Excel 文件 (27MB+) 中读取大型工作表？

Question

我有大型 Excel 工作表，我希望能够使用 PHPExcel 将其读入 MySQL。

我正在使用最近的补丁，它允许您在不打开整个文件的情况下阅读工作表。这样我可以一次阅读一个工作表。

但是，一个 Excel 文件有 27MB 大。我可以成功读取第一个工作表，因为它很小，但是第二个工作表太大，以至于在 22:00 开始进程的 cron 作业在上午 8:00 没有完成，工作表太简单了。

有什么方法可以逐行读取工作表，例如：

$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    //BELOW IS "WISH CODE":
    foreach($row = 1; $row <=$max_rows; $row+= 100) {
        $dataset = $objReader->getWorksheetWithRows($row, $row+100);
        save_dataset_to_database($dataset);
    }
}

附录

@mark，我使用您发布的代码来创建以下示例：

function readRowsFromWorksheet() {

    $file_name = htmlentities($_POST['file_name']);
    $file_type = htmlentities($_POST['file_type']);

    echo 'Read rows from worksheet:<br />';
    debug_log('----------start');
    $objReader = PHPExcel_IOFactory::createReader($file_type);
    $chunkSize = 20;
    $chunkFilter = new ChunkReadFilter();
    $objReader->setReadFilter($chunkFilter);

    for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
        $chunkFilter->setRows($startRow, $chunkSize);
        $objPHPExcel = $objReader->load('data/' . $file_name);
        debug_log('reading chunk starting at row '.$startRow);
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
        echo '<hr />';
    }
    debug_log('end');
}

如以下日志文件所示，它在一个小的8K Excel 文件上运行良好，但是当我在一个3 MB Excel 文件上运行它时，它永远不会超过第一个块，有什么办法可以优化此代码的性能，否则看起来它的性能不足以从大型 Excel 文件中获取块：

2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)

附录 2

当我注释掉：

//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);

然后它以可接受的速度解析（大约每秒 2 行），有没有办法提高性能toArray()？

2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end

附录 3

这似乎工作得很好，例如，至少在3 MB文件上：

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
    $chunkFilter->setRows($startRow, $chunkSize);
    $objPHPExcel = $objReader->load('data/' . $file_name);
    debug_log('reading chunk starting at row ' . $startRow);
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false);
        echo '<tr>';
        foreach ($cellIterator as $cell) {
            if (!is_null($cell)) {
                //$value = $cell->getCalculatedValue();
                $rawValue = $cell->getValue();
                debug_log($rawValue);
            }
        }
    }
}

score 11 · Accepted Answer

可以使用读取过滤器读取“块”中的工作表，尽管我不能保证效率。

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';


/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}


echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/**  Create a new Reader of the type defined in $inputFileType  **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);



echo '<hr />';


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);

/**  Loop to read our worksheet in "chunk size" blocks  **/
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
    $chunkFilter->setRows($startRow,$chunkSize);
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
    $objPHPExcel = $objReader->load($inputFileName);

    //    Do some processing here

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
    var_dump($sheetData);
    echo '<br /><br />';
}

请注意，此读取过滤器将始终读取工作表的第一行，以及块规则定义的行。

使用读取过滤器时，PHPExcel 仍然解析整个文件，但只加载那些与定义的读取过滤器匹配的单元格，因此它只使用该单元格数量所需的内存。但是，它会多次解析文件，每个块解析一次，所以会比较慢。此示例一次读取 20 行：要逐行读取，只需将 $chunkSize 设置为 1。

如果您的公式引用不同“块”中的单元格，这也可能会导致问题，因为数据根本不适用于当前“块”之外的单元格。

score 4 · Accepted Answer

目前要阅读.xlsx，最好.csv的.ods选择是电子表格阅读器（https://github.com/nuovo/spreadsheet-reader），因为它可以读取文件而无需将其全部加载到内存中。对于.xls扩展，它有限制，因为它使用 PHPExcel 进行阅读。

score 1 · Accepted Answer

这是 ChunkReadFilter.php ：

<?php
Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter {

    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow = $startRow;
        $this->_endRow = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {

        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {

            return true;
        }
        return false;
    }

}
?>

这就是 index.php 和这个文件末尾的一个不完美但基本的实现。

<?php

require_once './Classes/PHPExcel/IOFactory.php';
require_once 'ChunkReadFilter.php';

class Excelreader {

    /**
     * This function is used to read data from excel file in chunks and insert into database
     * @param string $filePath
     * @param integer $chunkSize
     */
    public function readFileAndDumpInDB($filePath, $chunkSize) {
        echo("Loading file " . $filePath . " ....." . PHP_EOL);
        /**  Create a new Reader of the type that has been identified  * */
        $objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath));

        $spreadsheetInfo = $objReader->listWorksheetInfo($filePath);

        /**  Create a new Instance of our Read Filter  * */
        $chunkFilter = new ChunkReadFilter();

        /**  Tell the Reader that we want to use the Read Filter that we've Instantiated  * */
        $objReader->setReadFilter($chunkFilter);
        $objReader->setReadDataOnly(true);
        //$objReader->setLoadSheetsOnly("Sheet1");
        //get header column name
        $chunkFilter->setRows(0, 1);
        echo("Reading file " . $filePath . PHP_EOL . "<br>");
        $totalRows = $spreadsheetInfo[0]['totalRows'];
        echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>");

        /**  Loop to read our worksheet in "chunk size" blocks  * */
        /**  $startRow is set to 1 initially because we always read the headings in row #1  * */
        for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) {
            echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>");
            $i = 0;
            /**  Tell the Read Filter, the limits on which rows we want to read this iteration  * */
            $chunkFilter->setRows($startRow, $chunkSize);
            /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  * */
            $objPHPExcel = $objReader->load($filePath);
            $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false);

            $startIndex = ($startRow == 1) ? $startRow : $startRow - 1;
            //dumping in database
            if (!empty($sheetData) && $startRow < $totalRows) {
                /**
                 * $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize));
                 */

                echo "<table border='1'>";
                foreach ($sheetData as $key => $value) {
                    $i++;
                    if ($value[0] != null) {
                        echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>";
                    }
                }
                echo "</table><br/><br/>";
            }
            $objPHPExcel->disconnectWorksheets();
            unset($objPHPExcel, $sheetData);
        }
        echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>");
    }

    /**
     * Insert data into database table 
     * @param Array $sheetData
     * @return boolean
     * @throws Exception
     * THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD..
     */
    protected function dumpInDb($sheetData) {

        $con = DbAdapter::getDBConnection();
        $query = "INSERT INTO employe(name,address)VALUES";

        for ($i = 1; $i < count($sheetData); $i++) {
            $query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "',"
                    . "'" . mysql_escape_string($sheetData[$i][1]) . "')";
        }

        $query = trim($query, ",");
        $query .="ON DUPLICATE KEY UPDATE name=VALUES(name),
                =VALUES(address),
               ";
        if (mysqli_query($con, $query)) {
            mysql_close($con);
            return true;
        } else {
            mysql_close($con);
            throw new Exception(mysqli_error($con));
        }
    }

    /**
     * This function returns list of files corresponding to given directory path
     * @param String $dataFolderPath
     * @return Array list of file
     */
    protected function getFileList($dataFolderPath) {
        if (!is_dir($dataFolderPath)) {
            throw new Exception("Directory " . $dataFolderPath . " is not exist");
        }
        $root = scandir($dataFolderPath);
        $fileList = array();
        foreach ($root as $value) {
            if ($value === '.' || $value === '..') {
                continue;
            }
            if (is_file("$dataFolderPath/$value")) {
                $fileList[] = "$dataFolderPath/$value";
                continue;
            }
        }
        return $fileList;
    }

}

$inputFileName = './prueba_para_batch.xls';
$excelReader = new Excelreader();
$excelReader->readFileAndDumpInDB($inputFileName, 500);

php - 如何使用 PHPExcel 从大型 Excel 文件 (27MB+) 中读取大型工作表？

附录

附录 2

附录 3

3 回答 3

Related

Reference