0

我是卷曲的新手。我一直在尝试将这个亚马逊链接的内容(即 20 本书的图像、书名、作者和价格)抓取到一个 html 页面中。到目前为止,我已经使用以下代码打印页面

<?php
function curl($url) {
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_AUTOREFERER => TRUE,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT => 120,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_URL => $url,
    );

    $ch = curl_init();
    curl_setopt_array($ch, $options);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
?>

$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;

我尝试过使用正则表达式但失败了;我已经连续 6 小时尝试了所有可能的方法并且真的很累,希望我能在这里找到解决方案;仅仅感谢是不够的解决方案,但提前 tq。:)

更新:为像我这样的初学者找到了一个非常有用的网站(点击这里)(虽然不使用 cURL)。

4

1 回答 1

1

您确实应该使用AWSECommerce API,但这里有一种利用 Yahoo 的YQL服务的方法:

<?php
$query = sprintf(
    'http://query.yahooapis.com/v1/public/yql?q=%s',
    urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);

$xml = new SimpleXMLElement($query, null, true);

foreach ($xml->results->div as $product) {
    vprintf("%s\n", array(
        $product->div[1]->div[1]->a,
    ));
}

/*
    Engineering Thermodynamics
    A Textbook of Fluids Mechanics
    The Design of Everyday Things
    A Forest History of India
    Computer Networking
    The Story of Microsoft
    Private Empire: ExxonMobil and Americ...
    Project Management Metrics, KPIs, and...
    Design and Analysis of Experiments: I...
    IES - 2013: General English
    Foundation of Software Testing: ISTQB...
    Faster: 100 Ways to Improve your Digi...
    A Textbook of Fluid Mechanics and Hyd...
    Software Engineering for Embedded Sys...
    Communication Skills for Engineers
    Making Things Move DIY Mechanisms for...
    Virtual Instrumentation Using Labview
    Geometric Dimensioning and Tolerancin...
    Power System Protection & Switchgear...
    Computer Networks
*/
于 2013-07-25T12:57:22.483 回答