-1

我正在尝试使用以下代码 php 在 betfair.com 网站上进行网页抓取:

<?php    
    // Defining the basic cURL function
    function curl($url) {
        $ch = curl_init();  // Initialising cURL
        curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL
        return $data;   // Returning the data from the function
    }    

    $scraped_website = curl("https://www.betfair.com/exchange/football");       
    echo $scraped_website;          
?>  

以这种方式的代码有效。

但是,如果不是“ https://www.betfair.com/exchange/football ”,而是选择“ https://www.betfair.com/exchange/football/event?id=28040884 ”,则代码将停止工作。

请帮忙。

4

1 回答 1

0

看看 curl 收到的标题:

 HTTP/1.1 302 Moved Temporarily
 Location: https://www.betfair.com/exchange/plus/#/football/event/28040884
 Cache-Control: no-cache
 Pragma: no-cache
 Date: Fri, 09 Dec 2016 17:38:52 GMT
 Age: 0
 Transfer-Encoding: chunked
 Connection: keep-alive
 Server: ATS/5.2.1
 Set-Cookie: vid=00956994-084c-444b-ad26-38b1119f4e38; Domain=.betfair.com; Expires=Mon, 01-Dec-2022 09:00:00 GMT; Path=/
 X-Opaque-UUID: 80506a77-12c1-4c89-b4a6-fa499fd23895

实际上https://www.betfair.com/exchange/football/event?id=28040884发送一个 302 Moved Temporarily HTTP 重定向,并且您的脚本不遵循重定向,这就是它不起作用的原因。解决这个问题(使用 CURLOPT_FOLLOWLOCATION),你的代码工作正常。固定代码:

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}
var_dump(curl("https://www.betfair.com/exchange/football/event?id=28040884"));

(我还建议使用 CURLOPT_ENCODING=>'' ,如果支持,这将使 curl 使用压缩传输,并且 HTML 使用 gzip 压缩非常非常好,通常编译 curl 以支持,这使得站点下载速度更快,这使得 curl_exec () 返回更快)

于 2016-12-09T17:43:47.720 回答