我有一个任务来解析http://www.olx.in/cars-cat-378以使用正则表达式获取汽车、位置和价格。我看到很多帖子都暗示正则表达式不适合解析网页,但至少这次我仍然必须使用它。我已经尝试过如下所示的方式。但这不起作用。
<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL, 'http://www.olx.in/cars-cat-378');
/**
* Ask cURL to return the contents in a variable instead of simply echoing them to the browser.
*/
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/**
* Execute the cURL session
*/
$contents = curl_exec($ch);
/*
print the $contents variable
*/
$reg='/<div class="li .*?"><div class="row clearfix"><div class="c-1 table-cell"><div class="cropit">.*?<\/div><\/div><div class="second-column-container table-cell"><h3><a .*?>(.*?)<\/a><\/h3><div class="c-4"><span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span><\/div><span class="itemlistinginfo clearfix"><a .*?>(.*?)<\/a><\/span><div .*?><\/div><\/div><div class="third-column-container table-cell">(.*?)<\/div><div class="fourth-column-container table-cell">(.*?)<\/div><\/div><\/div>/';
preg_match($reg,$contents,$result);
var_dump($result);
/**
* Close cURL session
*/
curl_close ($ch);
?>
页面每个列表项的html如下----
<div class="li even">
<div class="row clearfix">
<div class="c-1 table-cell">
<div class="cropit">
<a class="pics-lnk" href="http://newdelhi.olx.in/honda-prelude-2-door-sports-car-for-sale-iid-437128570">
<img src="http://images04.olx-st.com/ui/14/85/70/t_1347220402_437128570_4.jpg" width="111"
alt="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE." title="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE. - India"
height="83" style="margin-top:0px;" />
</a>
</div>
</div>
<div class="second-column-container table-cell">
<h3>
<a href="http://newdelhi.olx.in/honda-prelude-2-door-sports-car-for-sale-iid-437128570" title="HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE. - India">
HONDA PRELUDE,,2 DOOR ,,SPORTS CAR FOR SALE.</a>
</h3>
<div class="c-4">
<span>Year: 1996</span> - <span>Make: Honda</span> - <span>Model: Prelude</span> - <span>66,400.00 km</span> </div>
<span class="itemlistinginfo clearfix">
<a href="http://newdelhi.olx.in/cars-cat-378">Cars - Delhi</a> </span>
<div style="display:none;" class="fbfriends_loadme" id="fbfriends_loadme_437128570" rel="5656149"></div>
</div>
<div class="third-column-container table-cell">
र 2,65,000.00 </div>
<div class="fourth-column-container table-cell">
Yesterday, 15:53 </div>
</div>
</div>
我使用的正则表达式是-----
/<div class="li .*?"><div class="row clearfix"><div class="c-1 table-cell"><div class="cropit">.*?<\/div><\/div><div class="second-column-container table-cell"><h3><a .*?>(.*?)<\/a><\/h3><div class="c-4"><span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span> - <span>(.*?)<\/span><\/div><span class="itemlistinginfo clearfix"><a .*?>(.*?)<\/a><\/span><div .*?><\/div><\/div><div class="third-column-container table-cell">(.*?)<\/div><div class="fourth-column-container table-cell">(.*?)<\/div><\/div><\/div>/'