0
<?php

    $i=1;
    while ($i<=5) {
      # code...

      $url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_nav_0#'.$i;
      echo $url;
            $html= file_get_contents($url);
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xPath = new DOMXPath($dom);
            $classname="zg_title";
            $elements = $xPath->query("//*[contains(@class, '$classname')]");
                foreach ($elements as $e)
              {
                $lnk = $e->getAttribute('href');
                $e->setAttribute("href", "http://www.amazon.in".$lnk);
                $newdoc = new DOMDocument;
                $e = $newdoc->importNode($e, true);
                $newdoc->appendChild($e);
                $html = $newdoc->saveHTML();
                echo $html;
            }
            $i++;
           }
?>

我正在尝试浏览亚马逊畅销书页面,该页面列出了前 100 名畅销书商品,每页有 20 件商品。在每个循环中,$i 值都会更改并附加到 URL。但是只有前 20 个项目被显示了 5 次,我认为这与 ajax 分页有关,但我无法弄清楚它是什么。

4

1 回答 1

1

试试这个:

<?php

    $i=1;
    while ($i<=5) {
      # code...
        $url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_electronics_pg_'.$i.'?ie=UTF8&pg='.$i;
      echo $url;
            $html= file_get_contents($url);
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xPath = new DOMXPath($dom);
            $classname="zg_title";
            $elements = $xPath->query("//*[contains(@class, '$classname')]");
                foreach ($elements as $e)
              {
                $lnk = $e->getAttribute('href');
                $e->setAttribute("href", "http://www.amazon.in".$lnk);
                $newdoc = new DOMDocument;
                $e = $newdoc->importNode($e, true);
                $newdoc->appendChild($e);
                $html = $newdoc->saveHTML();
                echo $html;
            }
            $i++;
           }
?>

改变你的$url

于 2015-10-28T09:26:02.153 回答