python - 如何选择性地抓取具有重复类 ID 的 html

Question

我是 python 新手，徒劳地搜索了 stackoverflow 以获得我能理解的答案。提前感谢您提供的任何帮助或建议。

我正在尝试从房屋销售网站上抓取有关价格和位置的信息，即带有“字段内容”标签的信息。

问题是该页面有很多“字段内容”标签，而我正在尝试的原始代码会拉出并打印出看似随机的选择。

提前感谢您的帮助。

这是我要抓取的内容：

<div class="view-content">
<div class="views-row views-row-1 views-row-odd views-row-first views-row-last">
        <div class="views-field views-field-field-summary">        
<div class="field-content">
Land for sale in Prestatyn, Flintshire. Three acres of land with outline planning permission for three large, 4 bedroomed detached houses.
</div> 
 </div>  
         <div class="views-field views-field-field-price">    
<span class="views-label views-label-field-price">PRICE: </span>   
 <span class="field-content">£297,500</span>  
</div>

这是我试图让它给我回价格的基本尝试。还没有走得太远，像刮除价格以外的东西并将其保存到刮板维基表还有很长的路要走！

#!/usr/bin/env python

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[@class="views-label views-label-field-price"]/text()')
price = tree.xpath('//span[@class="field-content"]/text()')

print 'Type1: ', Type1
print 'price: ', price

score 0 · Accepted Answer

你可以试试这个

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[contains(@class,"field-price"]/text()')
price = tree.xpath('//span[contains(@class,"field-price")]/following-sibling::span[contains(@class,"field-content")][1]/text()')


print 'Type1: ', Type1
print 'price: ', price

希望你能得到你想要的结果。

python - 如何选择性地抓取具有重复类 ID 的 html

1 回答 1

Related

Reference