python-2.7 - html页面不同类下打印数据的问题，一起使用美汤

Question

我需要从网站上抓取设备的价格，网站上提到的价格有两种类型：

单价例如 $99.99
价格范围 “49.99 美元”至“99.99 美元”

在单个类别下提到了单个价格值，我能够提取这些值，但在 2 个类别中提到了价格范围，例如。

<div class="gridPrice">"$199.99" 
 <span class="multiDevicePrice-to">to</span> "$399.99"

作为范围提到的价格用双引号括起来，而作为单值的价格没有任何引号。

我正在使用以下代码：

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"},text=True):
    if anchor1.string:
        print unicode(anchor1.string).strip()
for anchor2 in soup.findAll('div', {"class": "gridPrice"},text=True):
    if anchor2.string:
        print unicode(anchor2.string).strip()

在输出中我没有得到价格范围的值，我需要的是所有价格的列表。

score 1 · Accepted Answer

您可以使用该.stripped_strings属性来获取给定标签中所有（剥离的）文本值的可迭代：

for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    textcontent = u' '.join(anchor1.stripped_strings)
    if textcontent:
        print textcontent

您可能只需要选择其中的一两个值；itertools.islice可以在那里提供帮助：

from itertools import islice

for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    textcontent = u' '.join(islice(anchor1.stripped_strings, 0, 3, 2))
    if textcontent:
        print textcontent

The islice call only selects the first and third elements, which are the from and to prices in the grid.

python-2.7 - html页面不同类下打印数据的问题，一起使用美汤

1 回答 1

Related

Reference