我正在尝试使用一个名为 Goutte (php scraper/web-crawler) 的包,如下所示:
<?php
// Init
require_once 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$reviews = array();
// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review')->each(function($node) use ($reviews)
{
// Parse Data
$player_name = $node->filter('tr.switch > td > a')->first()->text();
// other fields
// Build Reviews
array_push($reviews, [
'player_name' => $player_name,
// other fields
]);
});
// Debug
echo "<pre>";
print_r($reviews);
当此脚本运行时,$reviews
数组始终为空。但是,如果我print_r
在匿名函数中,它似乎只显示每个循环中的当前元素。例如,如果有 4 条评论,我会这样做:
// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review-BL-mid')->each(function($node) use ($reviews)
{
// Parse Data
$player_name = $node->filter('tr.switch > td > a')->first()->text();
// other fields
// Build Reviews
array_push($reviews, [
'player_name' => $player_name,
// other fields
]);
// Debug
print_r($reviews);
});
它输出如下:
Array
(
[0] => Array
(
[player_name] => aaaa
)
)
Array
(
[0] => Array
(
[player_name] => bbb
)
)
Array
(
[0] => Array
(
[player_name] => ccc
)
)
Array
(
[0] => Array
(
[player_name] => ddd
)
)
好像数组永远不会在匿名函数中更新。知道如何解决这个问题吗?