我是 python 新手,徒劳地搜索了 stackoverflow 以获得我能理解的答案。提前感谢您提供的任何帮助或建议。
我正在尝试从房屋销售网站上抓取有关价格和位置的信息,即带有“字段内容”标签的信息。
问题是该页面有很多“字段内容”标签,而我正在尝试的原始代码会拉出并打印出看似随机的选择。
提前感谢您的帮助。
这是我要抓取的内容:
<div class="view-content">
<div class="views-row views-row-1 views-row-odd views-row-first views-row-last">
<div class="views-field views-field-field-summary">
<div class="field-content">
Land for sale in Prestatyn, Flintshire. Three acres of land with outline planning permission for three large, 4 bedroomed detached houses.
</div>
</div>
<div class="views-field views-field-field-price">
<span class="views-label views-label-field-price">PRICE: </span>
<span class="field-content">£297,500</span>
</div>
这是我试图让它给我回价格的基本尝试。还没有走得太远,像刮除价格以外的东西并将其保存到刮板维基表还有很长的路要走!
#!/usr/bin/env python
from lxml import html
import requests
page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)
Type1 = tree.xpath('//span[@class="views-label views-label-field-price"]/text()')
price = tree.xpath('//span[@class="field-content"]/text()')
print 'Type1: ', Type1
print 'price: ', price