0

因此,我不知道如何从网站页面中抓取段落的底层文本,该页面没有使用 php 的任何“id”或“类”。一种方法是计数并遍历

a 中的标签,但 div 本身在任何之前关闭

遇到标签。我打算抓取 wikitravel.org 信息以学习抓取。这是wikitravel.org的页面源示例之一

   <h2><span class="editsection">[<a href="/wiki/en/index.php?title=Kanniyakumari&    amp;action=edit&amp;section=18" title="Edit section: Sleep">edit</a>][<a href="#Sleep" title="click to add a sleep listing" onclick="addListing(this, '18', 'sleep', 'Kanniyakumari');">add listing</a>]</span> <span class="mw-headline" id="Sleep">Sleep</span></h2>

   <p>There are numerous hotels, residencies etc. in and around Kanyakumari and therefore, staying over is not be a problem. But there are agents, touts and brokers in every nook and corner looking for unsuspecting tourists. Eschew buying or booking rooms from them, as many a time you end up paying a lot more than the actual price. Vivekananda Kendra can be a good option for people looking for a decent, yet cheap accommodation, but it's around 3 km from Kanyakumari. Prefer hotels near the beach especially if you want to watch the sunrise right out of your bed! Note that, you should quote this preference when booking the room or else, you'll always be given a room without a window opening out to the sea. Moreover many a times, these rooms are in great demand and you'll find yourself shelling a extra 400 - 500 Rs (~10 US$)for such a room. Hotel Sea View, Hotel Sangam and a couple of other hotels offer such rooms and the rent is about Rs. 1100 (~ 25 US$) for 12 hrs. Note that many rooms are priced for 12 hrs  and not per day especially during the peak season.
</p>

<p>ATM's in Kanyakumari:</p>

 <p>Canara Bank 
 Main Road, Kanyakumari 629702, ,
 </p>
 <p>Indian Bank 
  S No 658 / 1, National High Way Opp St Antony'S Higher Secondary Sckanyakumari 629702
 </p>
<p>State Bank Of Travancore 
P.B.No.1, 1/17 Amman Sannathi Street, Kanyakumari, Tamil Nadu, 629702
</p>

有人可以帮忙吗?提前致谢!

4

2 回答 2

0

我一直发现 JQuery 是抓取 HTML 数据的最佳方式。让 PHP 使用 JQuery 呈现一个页面,该页面解析抓取的 HTML,然后将 JSON 数据集发布回 PHP。

如果您想坚持使用纯 PHP 路线,请尝试以下库:http: //simplehtmldom.sourceforge.net/

于 2013-04-10T07:48:58.650 回答
0

看看simplehtmldom解析器。它应该与类似 jQuery 的选择器一起使用。

这是您的案例的示例:

$html = file_get_html('http://www.wikitravel.com/yourpage');
foreach($html->find('p') as $element){
    echo $element->innertext; // the content in all the p tags
}
于 2013-04-10T07:50:48.230 回答