因此,我不知道如何从网站页面中抓取段落的底层文本,该页面没有使用 php 的任何“id”或“类”。一种方法是计数并遍历
a 中的标签,但 div 本身在任何之前关闭
遇到标签。我打算抓取 wikitravel.org 信息以学习抓取。这是wikitravel.org的页面源示例之一
<h2><span class="editsection">[<a href="/wiki/en/index.php?title=Kanniyakumari& amp;action=edit&section=18" title="Edit section: Sleep">edit</a>][<a href="#Sleep" title="click to add a sleep listing" onclick="addListing(this, '18', 'sleep', 'Kanniyakumari');">add listing</a>]</span> <span class="mw-headline" id="Sleep">Sleep</span></h2>
<p>There are numerous hotels, residencies etc. in and around Kanyakumari and therefore, staying over is not be a problem. But there are agents, touts and brokers in every nook and corner looking for unsuspecting tourists. Eschew buying or booking rooms from them, as many a time you end up paying a lot more than the actual price. Vivekananda Kendra can be a good option for people looking for a decent, yet cheap accommodation, but it's around 3 km from Kanyakumari. Prefer hotels near the beach especially if you want to watch the sunrise right out of your bed! Note that, you should quote this preference when booking the room or else, you'll always be given a room without a window opening out to the sea. Moreover many a times, these rooms are in great demand and you'll find yourself shelling a extra 400 - 500 Rs (~10 US$)for such a room. Hotel Sea View, Hotel Sangam and a couple of other hotels offer such rooms and the rent is about Rs. 1100 (~ 25 US$) for 12 hrs. Note that many rooms are priced for 12 hrs and not per day especially during the peak season.
</p>
<p>ATM's in Kanyakumari:</p>
<p>Canara Bank
Main Road, Kanyakumari 629702, ,
</p>
<p>Indian Bank
S No 658 / 1, National High Way Opp St Antony'S Higher Secondary Sckanyakumari 629702
</p>
<p>State Bank Of Travancore
P.B.No.1, 1/17 Amman Sannathi Street, Kanyakumari, Tamil Nadu, 629702
</p>
有人可以帮忙吗?提前致谢!