php - 如何使用curl从远程页面的head标签中提取链接

Question

我有一些网址，这些网址的所有 html 在其标签中都有以下标签

 <link rel="image_src" href="http://imgv2-4.scribdassets.com/img/word_document/15490455
  /164x212/8a4ab0c34b/1337732662" />

我正在使用以下代码

    $url = 'my url';
    $ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
$result = curl_exec($ch);

$regex='|<a.*?href="(.*?)"|';
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
    //if(strpos($link,'format=json') !==false) {
        echo $link;
    //}
}

现在我想抓住这个链接href，但我不知道怎么做。请帮我

谢谢

score 2 · Accepted Answer

I prefer using PHP's DOMDocument going through HTML, versus preg_match. Something like this should work:

$xpath = new DOMXPath($result);
$links = $xpath->query('//link[@rel="image_src"]');
foreach ($links as $link) {
     $src = $link->nodeValue;
}

score 2 · Accepted Answer

这是对我有帮助的另一种选择。这类似于DOMXPATH@Mark Roach 的建议

$dom = new DOMDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('link');
foreach ($nodes as $node){
    if ($node->getAttribute('rel') === 'image_src')
    {
        echo($node->getAttribute('href'));
    }
}

score 0 · Accepted Answer

像这样

    <?php
    $url = 'http://www.scribd.com/doc/15490455/Learning-PHP-5';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);    // The url to get links from
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // We want to get the respone
    $result = curl_exec($ch);

    $regex='#.*link rel=\"image_src\" href=\"(.*)\"./>#';
    preg_match($regex,$result,$parts);

    foreach ($parts as $part) {
       echo = $part;
    }
    ?>

php - 如何使用curl从远程页面的head标签中提取链接

3 回答 3

Related

Reference