-4

I am building a php data miner (scraper) I have this html line:

<label class='area'>
  <font class='bg_info' onmouseover="land_convert_txt(this,3067)" onmouseout='tooltip_hide()'>
   3,067 Sq. Ft.
  </font>

how to setup my regex to extract the area value only?

this is my function:

function extract_regex($subject, $regex, $index = 1)
{
    preg_match_all($regex, $subject, $matches);
    if (count($matches[$index]))
    {
        if (count($matches[$index]) == 1)
        {
            return trim($matches[$index][0]);
        }
        return $matches[$index];        
    }
    return '';
}

(this,3067) keep changing!

Thank you in advanced

4

2 回答 2

1

不要使用正则表达式来处理 HTML!
不要试图重新发明轮子,你可能会创建一个正方形。

尝试使用一些 PHP 网页抓取工具,例如:

http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

像这样使用代码:

# create and load the HTML
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load($myHTML);

# get an element representing the area element
//$element =  $html->find('label[class=area]'); 
$element = $html->find(".area")

# Echo it out
echo $element[1]->innertext
于 2013-06-30T11:18:57.987 回答
0
 function extract_regex($subject, $regex, $index = 1)
    {
        preg_match_all($regex, $subject, $matches);
        if (count($matches[$index]))
        {
            if (count($matches[$index]) == 1)
            {
                return trim($matches[$index][0]);
            }
            return $matches[$index];        
        }
        return '';
    }

    $out = extract_regex("<label class='area'><font class='bg_info' onmouseover='land_convert_txt(this,3067)' onmouseout='tooltip_hide()'>3,067 Sq. Ft.</font></label>","/<label class=\'area\'>(.*)<\/label>/i");

        echo "<xmp>". $out . "</xmp>";
于 2013-06-30T12:03:00.873 回答