php - 某些东西在 CURL 抓取中不起作用

Question

我正在尝试使用旧的（死）torrentz.eu 刮板代码来刮掉 torrentz2.eu 的搜索结果：

当我运行http://localhost/jits/torz/api.php?key=kabali 它时，它会显示警告和空值。

Notice: Undefined variable: results_urls in /Applications/XAMPP/xamppfiles/htdocs/jits/torz/api.php on line 59
null

为什么？

谁能告诉我代码有什么问题。？

这是代码：

<?php   
    $t= $_GET['key'];
    // Defining the basic cURL function
    function curl($url) {
        // Assigning cURL options to an array
        $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
            CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
            CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
            CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
            CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
            CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
            CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0",  // Setting the useragent
            CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
        );
         
        $ch = curl_init();  // Initialising cURL 
        curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL 
        return $data;   // Returning the data from the function 
    }
?>

<?php
    // Defining the basic scraping function
    function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }
?>
<?php

    $url = "https://torrentz2.eu/search?f=$t";    // Assigning the URL we want to scrape to the variable $url
    $results_page = curl($url); // Downloading the results page using our curl() funtion
    //var_dump($results_page);
    //die(); 
    $results_page = scrape_between($results_page, "<dl><dt>", "<a href=\"http://www.viewme.com/search?q=$t\" title=\"Web search results on ViewMe\">"); // Scraping out only the middle section of the results page that contains our results 
    $separate_results = explode("</dd></dl>", $results_page);   // Expploding the results into separate parts into an array

    // For each separate result, scrape the URL
    foreach ($separate_results as $separate_result) {
        if ($separate_result != "") {
            $results_urls[] =  scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array
     
        }
      
    }
     
    //print_r($results_urls); // Printing out our array of URLs we've just scraped      

if($_GET["key"] === null) {
echo "Keyword Missing ";
  } else if(isset($_GET["key"])) {

       echo json_encode($results_urls);

  } 
          
       ?>

对于旧的 torrentz.eu 刮板代码参考：GIT repo

score 1 · Accepted Answer

首先你会得到通知“未定义的变量：results_urls”，因为 $results_urls 是直接定义和使用的。定义它然后使用它。

做类似的事情： -

    // $results_urls defined here:-
    $results_urls = [];
    // For each separate result, scrape the URL
    foreach ($separate_results as $separate_result) {
        if ($separate_result != "") {
            $results_urls[] =  scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array

        }

    }

其次，打印空值是因为$results_urls没有被填充，因为$separate_results没有被正确填充。它只有一个空值。我进一步调试，发现$results_page值为 false。因此，无论您在“scrape_between”函数中尝试做什么，都无法按预期工作。修复你的功能。

php - 某些东西在 CURL 抓取中不起作用

1 回答 1

Related

Reference