0

尝试获取href此 HTML 的值

<a class="list-item clearfix" href="/en/rolex/submariner-date--id2334149.htm" id="watch-2334149" style="background-color: rgb(255, 255, 255);">

      <span onclick="_gaq.push(['first._trackEvent','Click','search','watch-image-click']);_gaq.push(['second._trackEvent','Click','search','watch-image-click']);" class="pic ">
        <span style="position:absolute">

          <img width="100" height="100" alt="Rolex Submariner Date" src="" class="photo">
        </span>
      </span>

  <span class="disc">
    <span onclick="_gaq.push(['first._trackEvent','Click','search','watch-headline-click']);_gaq.push(['second._trackEvent','Click','search','watch-headline-click']);" class="watch-headline"><span class="underline">Rolex Submariner Date</span></span>

        <span class="spec">


          <span onmouseover="$('#infobox-title').text('Germany');$('#infobox-text').text('This dealer is from Augsburg, Germany.')" style="width: 21px;" class="flag">

          <img width="16" height="16" alt="" src="http://cdn.chrono24.com/images/flags-icons/DE.png">&nbsp;
            </span>
            <span class="icon i-hasnostore"></span>
                    <span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="icon i-trusted"></span>

                        <span onmouseover="$('#infobox-title').text('Retailer recommendations');$('#infobox-text').text('This watch retailer is recommended on Chrono24 by 1 other watch retailers.')" class="i-buddies">
                          <span class="icon buddie-count">1</span>
                          <span class="icon i-star-blue"></span>
                        </span>


              <span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="trustedseller">
                    <script type="text/javascript">
                        // &lt;![CDATA[
                        document.write('Trusted Seller since 2004');
                        // ]]&gt;
                    </script>Trusted Seller since 2004
                  </span>    


                  <span style="width: 2px;" class="icon"></span>
                  <span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="icon i-premium"></span>
                <span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="premiumseller">Premium</span>

            </span>
            <span onclick="_gaq.push(['first._trackEvent','Click','search','watch-desc-click']);_gaq.push(['second._trackEvent','Click','search','watch-desc-click']);" class="description">
              Ref. No. 116610 LN; Steel; Automatic; Condition 0 (unworn); Year 2013; With Box; With Papers; Location: Germany, Augsburg; The current, the manufacturer's recommended retail price is 6800 Euro
            </span>


              <span class="availability">Availability: Available immediately</span>



  </span>
  <span class="pricebox">
    <span onclick="_gaq.push(['first._trackEvent','Click','search','watch-price-click']);_gaq.push(['second._trackEvent','Click','search','watch-price-click']);" class="amount price"><span class="large">$&nbsp;7,961</span>
    </span>

    <span class="buttonbox">
      <span onclick="_gaq.push(['first._trackEvent','Click','search','watch-button-click']);_gaq.push(['second._trackEvent','Click','search','watch-button-click']);" class="button-blue">
         <span>
          Watch details
         </span>
      </span>
    </span>


  </span>             

</a>
preg_match_all('#<a href="(.+)">#',$html,$urlarr);

这根本没有给出href价值,不知道这是怎么回事。

4

4 回答 4

2

不要在 HTML 上使用正则表达式;HTML 不规则

你应该看看 SimpleXML 和 XPath,它们是这项工作的完美选择:http: //php.net/manual/en/simplexmlelement.xpath.php

例如:

$xml   = new SimpleXMLElement($html);

// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");
// You probably want the first one
$href = $links[0]["href"]
于 2013-09-12T14:41:50.413 回答
1

All the methods with the DOM as suggested should work. If you want to use regex, you can try this:

preg_match_all('~<a (?>[^>h]++|\Bh|h(?!ref\b))*href\s*=\s*["\']?\K[^"\'>\s]++~i', $html, $matches);

If you want to match only href in a tags that have list-item clearfix as class attribute value, you can do this:

$pattern = <<<'LOD'
~
(?(DEFINE)
    (?<class> \b class \s* = \s* (["']) list-item \s+ clearfix \g{-1} )
    (?<href_value> [^"'\s>]++ )
    (?<href_start> \b href \s*=\s* ["']? )
    (?<href_end> ['"\s] )
    (?<content> (?> [^>hc]++ | \B[hc] | h(?!ref\b) | c(?!lass\b) )* )

)
    <a \s+
    \g<content>
    (?J)
    (?>
        \g<class> \g<content> \g<href_start> (?<href> \g<href_value> )
      |
        \g<href_start> (?<href> \g<href_value> ) \g<href_end> \g<content> \g<class>
    )
~xi
LOD;

preg_match_all($pattern, $html, $matches, PREG_SET_ORDER); 

foreach($matches as $match) {
    echo '<br>' . $match['href'];
}

Keep in mind that using XPath is much easier to do that:

$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$hrefs = $xpath->query("//a[@class='list-item clearfix']/@href");
foreach($hrefs as $href) {
    print_r($href->nodeValue);
}
于 2013-09-12T15:01:54.540 回答
1

如果使用正则表达式,则应使用 domdocument:

 $dom = new domDocument;
    $dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    $link  = $dom->getElementsByTagName("a");
    $links = array();
    for($i = 0; $i < $link->length; $i++) {
       $links[] = $link->item($i)->getAttribute("href");
    }
于 2013-09-12T14:40:49.217 回答
0

使用正则表达式解析 HTML 是个坏主意(至少在这种情况下是这样)。为此目的使用 DOMParser,例如SimpleHTMLDOM :

这很容易:

$html = str_get_html('...');
foreach($html->find('a') as $element) 
    echo $element->href;

或者,您也可以从文件中加载它:

$html = file_get_html('...');
foreach($html->find('a') as $element) 
    echo $element->href;

这也可以通过内置 DOM 实现:

$dom = new DOMDocument();
$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a"); //all <a> tags
$urlArray = array();

for ($i = 0; $i < $hrefs->length; $i++) {
       $href = $hrefs->item($i);
       $urlArray[] = $href->getAttribute('href');
}

在行动中看到它

于 2013-09-12T14:42:01.103 回答