1

我正在使用 php 脚本来使用 curl 从外部 url 下载 xml 文件,但我遇到了问题。Curl 有时无法下载完整的文件。当我使用 cron 通过主机服务器运行脚本时,问题会更频繁地发生。

这是脚本:

<?php
header('Content-type:text/html; charset=utf-8');

//initialize downloading xml file tries
$xml_dl_attempts = 0;

//set filename of output xml file
$findex = 0;
while(file_exists("xml".$findex.".xml"))
{
    $findex++;
}
$filename = "xml".$findex.".xml";

//filname for log file
$logfilename = "log.txt";

//Open (append) logfile for write.
$logfileout = fopen($logfilename, 'a');
fwrite($logfileout, "Starting attempts to download the xml file at ".date("H:i:s Y-m-d")."\r\n");

//Attempt to download xml file 8 times
do {
    //Sleep 3 second before retrying download
    if($xml_dl_attempts > 0 ) sleep(3);

    //Increse number of download attempts
    $xml_dl_attempts++;
    //Write to logfile
    fwrite($logfileout, date("H:i:s Y-m-d").": Download attempt number ".$xml_dl_attempts.": ");

    //Download xml file using curl
    $ch = curl_init();
    $url = 'http://www.opap.gr/web/services/rs/betting/availableBetGames/sport/program/4100/0/sport-1.xml?localeId=el_GR';

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    set_time_limit(300); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 300);

    $outfile = fopen($filename, 'w');
    if (!$outfile)
    {
    exit;
    }
    curl_setopt($ch, CURLOPT_FILE, $outfile);

    if(curl_exec($ch)==false)
    {
        fwrite($logfileout, "curl_error: ".curl_error($ch));
    }
    fclose($outfile);
    curl_close($ch);

    //Clear errors
    libxml_use_internal_errors(true);
    libxml_clear_errors();

    //Parse xml file
    $xml = simplexml_load_file($filename);

    //Check for errors
    if($err = libxml_get_last_error())
    {
        fwrite($logfileout, "failed\r\n");
    }
} while($err !== false && $xml_dl_attempts < 8); //repeat if xml was not completely downloaded

//Check if 
if(!$err)
{
    fwrite($logfileout, "successfull\r\n");
}
fwrite($logfileout, "End.\r\n");
fclose($logfileout);
?>

如您所见,我检查 simplexml 解析器在解析下载的 xml 文件时是否出错。如果发生错误,我会重复该过程,最多尝试 8 次。我还创建了一个日志文件。

这是一整天的日志文件:

Starting attempts to download the xml file at 18:35:00 2012-09-25

18:35:00 2012-09-25: Download attempt number : failed

18:35:03 2012-09-25: Download attempt number : failed

18:35:07 2012-09-25: Download attempt number : successfull

End.

Starting attempts to download the xml file at 19:35:00 2012-09-25

19:35:00 2012-09-25: Download attempt number 1: failed

19:35:03 2012-09-25: Download attempt number 2: failed

19:35:06 2012-09-25: Download attempt number 3: failed

19:35:10 2012-09-25: Download attempt number 4: failed

19:35:13 2012-09-25: Download attempt number 5: failed

19:35:16 2012-09-25: Download attempt number 6: failed

19:35:20 2012-09-25: Download attempt number 7: failed

19:35:23 2012-09-25: Download attempt number 8: successfull

End.

Starting attempts to download the xml file at 20:35:00 2012-09-25

20:35:00 2012-09-25: Download attempt number 1: failed

20:35:04 2012-09-25: Download attempt number 2: failed

20:35:08 2012-09-25: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 21:35:00 2012-09-25

21:35:00 2012-09-25: Download attempt number 1: failed

21:35:04 2012-09-25: Download attempt number 2: failed

21:35:07 2012-09-25: Download attempt number 3: failed

21:35:11 2012-09-25: Download attempt number 4: successfull

End.

Starting attempts to download the xml file at 22:35:00 2012-09-25

22:35:00 2012-09-25: Download attempt number 1: failed

22:35:04 2012-09-25: Download attempt number 2: failed

22:35:07 2012-09-25: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 23:35:00 2012-09-25

23:35:00 2012-09-25: Download attempt number 1: failed

23:35:03 2012-09-25: Download attempt number 2: failed

23:35:07 2012-09-25: Download attempt number 3: failed

23:35:10 2012-09-25: Download attempt number 4: failed

23:35:14 2012-09-25: Download attempt number 5: failed

23:35:17 2012-09-25: Download attempt number 6: failed

23:35:21 2012-09-25: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 00:35:00 2012-09-26

00:35:00 2012-09-26: Download attempt number 1: successfull

End.

Starting attempts to download the xml file at 01:35:00 2012-09-26

01:35:00 2012-09-26: Download attempt number 1: failed

01:35:04 2012-09-26: Download attempt number 2: failed

01:35:07 2012-09-26: Download attempt number 3: failed

01:35:11 2012-09-26: Download attempt number 4: failed

01:35:14 2012-09-26: Download attempt number 5: failed

01:35:18 2012-09-26: Download attempt number 6: failed

01:35:21 2012-09-26: Download attempt number 7: failed

01:35:30 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 02:35:00 2012-09-26

02:35:00 2012-09-26: Download attempt number 1: failed

02:35:03 2012-09-26: Download attempt number 2: failed

02:35:07 2012-09-26: Download attempt number 3: failed

02:35:10 2012-09-26: Download attempt number 4: failed

02:35:13 2012-09-26: Download attempt number 5: failed

02:35:17 2012-09-26: Download attempt number 6: failed

02:35:20 2012-09-26: Download attempt number 7: failed

02:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 03:35:00 2012-09-26

03:35:00 2012-09-26: Download attempt number 1: failed

03:35:04 2012-09-26: Download attempt number 2: failed

03:35:07 2012-09-26: Download attempt number 3: failed

03:35:10 2012-09-26: Download attempt number 4: failed

03:35:14 2012-09-26: Download attempt number 5: failed

03:35:17 2012-09-26: Download attempt number 6: failed

03:35:21 2012-09-26: Download attempt number 7: failed

03:35:30 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 04:35:00 2012-09-26

04:35:00 2012-09-26: Download attempt number 1: failed

04:35:03 2012-09-26: Download attempt number 2: failed

04:35:07 2012-09-26: Download attempt number 3: failed

04:35:10 2012-09-26: Download attempt number 4: failed

04:35:14 2012-09-26: Download attempt number 5: failed

04:35:17 2012-09-26: Download attempt number 6: failed

04:35:21 2012-09-26: Download attempt number 7: failed

04:35:24 2012-09-26: Download attempt number 8: successfull

End.

Starting attempts to download the xml file at 05:35:00 2012-09-26

05:35:00 2012-09-26: Download attempt number 1: failed

05:35:04 2012-09-26: Download attempt number 2: failed

05:35:08 2012-09-26: Download attempt number 3: failed

05:35:11 2012-09-26: Download attempt number 4: failed

05:35:15 2012-09-26: Download attempt number 5: failed

05:35:18 2012-09-26: Download attempt number 6: failed

05:35:22 2012-09-26: Download attempt number 7: failed

05:35:25 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 06:35:00 2012-09-26

06:35:00 2012-09-26: Download attempt number 1: failed

06:35:03 2012-09-26: Download attempt number 2: failed

06:35:07 2012-09-26: Download attempt number 3: failed

06:35:10 2012-09-26: Download attempt number 4: failed

06:35:14 2012-09-26: Download attempt number 5: failed

06:35:17 2012-09-26: Download attempt number 6: failed

06:35:21 2012-09-26: Download attempt number 7: failed

06:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 07:35:00 2012-09-26

07:35:00 2012-09-26: Download attempt number 1: failed

07:35:04 2012-09-26: Download attempt number 2: failed

07:35:07 2012-09-26: Download attempt number 3: failed

07:35:11 2012-09-26: Download attempt number 4: failed

07:35:14 2012-09-26: Download attempt number 5: failed

07:35:18 2012-09-26: Download attempt number 6: failed

07:35:21 2012-09-26: Download attempt number 7: failed

07:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 08:35:00 2012-09-26

08:35:00 2012-09-26: Download attempt number 1: failed

08:35:03 2012-09-26: Download attempt number 2: failed

08:35:06 2012-09-26: Download attempt number 3: failed

08:35:10 2012-09-26: Download attempt number 4: failed

08:35:13 2012-09-26: Download attempt number 5: failed

08:35:16 2012-09-26: Download attempt number 6: failed

08:35:20 2012-09-26: Download attempt number 7: failed

08:35:23 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 09:35:00 2012-09-26

09:35:00 2012-09-26: Download attempt number 1: failed

09:35:04 2012-09-26: Download attempt number 2: failed

09:35:07 2012-09-26: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 10:35:00 2012-09-26

10:35:00 2012-09-26: Download attempt number 1: failed

10:35:03 2012-09-26: Download attempt number 2: failed

10:35:06 2012-09-26: Download attempt number 3: failed

10:35:10 2012-09-26: Download attempt number 4: failed

10:35:13 2012-09-26: Download attempt number 5: failed

10:35:17 2012-09-26: Download attempt number 6: failed

10:35:20 2012-09-26: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 11:35:00 2012-09-26

11:35:00 2012-09-26: Download attempt number 1: failed

11:35:03 2012-09-26: Download attempt number 2: failed

11:35:07 2012-09-26: Download attempt number 3: successfull

End.

Starting attempts to download the xml file at 12:35:00 2012-09-26

12:35:00 2012-09-26: Download attempt number 1: failed

12:35:04 2012-09-26: Download attempt number 2: failed

12:35:07 2012-09-26: Download attempt number 3: failed

12:35:11 2012-09-26: Download attempt number 4: failed

12:35:14 2012-09-26: Download attempt number 5: failed

12:35:17 2012-09-26: Download attempt number 6: failed

12:35:21 2012-09-26: Download attempt number 7: successfull

End.

Starting attempts to download the xml file at 13:35:00 2012-09-26

13:35:00 2012-09-26: Download attempt number 1: failed

13:35:03 2012-09-26: Download attempt number 2: successfull

End.

Starting attempts to download the xml file at 14:35:00 2012-09-26

14:35:00 2012-09-26: Download attempt number 1: failed

14:35:03 2012-09-26: Download attempt number 2: failed

14:35:07 2012-09-26: Download attempt number 3: failed

14:35:10 2012-09-26: Download attempt number 4: successfull

End.

Starting attempts to download the xml file at 15:35:00 2012-09-26

15:35:00 2012-09-26: Download attempt number 1: failed

15:35:03 2012-09-26: Download attempt number 2: failed

15:35:07 2012-09-26: Download attempt number 3: failed

15:35:10 2012-09-26: Download attempt number 4: failed

15:35:13 2012-09-26: Download attempt number 5: failed

15:35:17 2012-09-26: Download attempt number 6: failed

15:35:20 2012-09-26: Download attempt number 7: failed

15:35:24 2012-09-26: Download attempt number 8: failed

End.

Starting attempts to download the xml file at 16:35:00 2012-09-26

16:35:00 2012-09-26: Download attempt number 1: failed

16:35:03 2012-09-26: Download attempt number 2: failed

16:35:07 2012-09-26: Download attempt number 3: successfull

End.

问题是,有时它会在一些尝试后设法获得完整的文件,而其他时候则完全失败。另外需要注意的是,当 xml 不完整时 curl_exec 不会返回错误。

不幸的是,具有 xml 的服务器不支持范围,所以我不能在文件不完整时恢复文件。我可以增加尝试的限制,比如说 50,但问题是,在失败的尝试中,脚本仍然会下载一些数据,所以对于一个 1MB 的 xml 文件,如果它失败 30 次,每次下载 500KB,它就会下载16 MB 数据用于成功尝试。我想每小时运行一次这个脚本,所以我相信这会损害我服务器的带宽。

为什么 curl 无法下载完整的文件。是否有一些选项可以让它像浏览器一样运行,最终总是获取文件?

谢谢。

4

1 回答 1

1

问题出在您的来源:服务器。

我试着运行你的刮刀,scraperwiki它显示的是:

第一张截图

此外,当我个人尝试加载 xml 并且第三次对我有用时,也出现了同样的问题。

您可以在下图中的前两个请求中看到服务器正在关闭连接,而不是第三个(成功的)。

第二张截图

所以,问题出在服务器上,如果它不是你的,你就无能为力。(当然除了把这个带给他们服务器管理员通知!)

注意:我相信 scraperwiki 的互联网连接非常好,因为它被许多人所依赖。因此,您可以放心地将其归咎于服务器故障#jboss

于 2012-09-26T17:57:26.103 回答