php - 使用 PHP curl 下载一些文件时出现问题

Question

我有一个脚本可以在登录另一个站点后下载 PDF 文件。到目前为止，它对所有网站都非常有效，但我现在对我正在抓取的新网站感到有些奇怪：下载的一些文件是 1kb（即它不起作用），而其他文件则很好。使用浏览器中的下载链接会打开“您要保存此文件吗”窗口，并且该文件在那里是正确的。

这是我的代码（我包括整个抓取过程中使用的一般 curl 参数，以及我尝试下载文件的最后部分）：

//Initial connection to login page
$header[] = 'Host: www.domain.com';
$header[] = 'Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$header[] = 'Accept-Language: en-US,en;q=0.5';
$header[] = 'Connection: keep-alive';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/login');
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieLocation);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieLocation);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$webpage = curl_exec($ch);

//Then several operations to login, grab the list of links to PDF download files (...)

//Loop through the array containing the url of the file to download and save it to a folder (writable)
curl_setopt($ch, CURLOPT_POST, false);
foreach($foundBills as $key => $bill)
{
    curl_setopt($ch, CURLOPT_URL, $bill['url']);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    $pdfFile = curl_exec($ch);
    $randomFileName = rand_string(20); //generates a 20 char long random string
    $newPDF = $userBillsRoot.$randomFileName.'.pdf';
    write_file($newPDF, $pdfFile, 'wb'); //using a Codeigniter function to save the file
}

这些文件每个都不到 1mb。有任何想法吗？如何查看有关它为什么不工作（例如超时）的更多详细信息？谢谢！

php - 使用 PHP curl 下载一些文件时出现问题

0 回答 0

Related

Reference