1

Here is my code.

 import requests
 from bs4 import BeautifulSoup
 res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&')
 soup = BeautifulSoup(res.text)
 price = soup.find_all('div', class_="product-price").children

I want to scrape data from this website but that div doesn't have class that is why I don't know how to do that then I found that you can find children of div tag but it is also not working and I'm trying to get all tag.

4

3 回答 3

5

There are multiple ways to get the desired price values.

You can use a CSS selector and get the first child of every div having product-price class:

for price in soup.select("div.product-price > div:nth-of-type(1)"):
    print price.get_text(strip=True) 

This would print:

Rs  33490Rs 42990(22%)
Rs  26799Rs 31500(15%)
...
Rs  41790Rs 44990(7%)
Rs  48000Rs 50000(4%)

nth-of-type documentation reference.

Note that along with an actual price it contains the previous price which is on the strikethrough font. To get rid of it, get only top level text from the div by using find() with text=True and recursive=False:

for price in soup.select("div.product-price > div:nth-of-type(1)"):
    print price.find(text=True, recursive=False).strip()

Prints:

Rs  33490
Rs  26799
...
Rs  41790
Rs  48000

You can go further and omit the Rs at the beginning and get the int (or float) price values:

for div in soup.select("div.product-price > div:nth-of-type(1)"):
    price = div.find(text=True, recursive=False).strip()
    price = float(price.replace("Rs  ", ""))
    print price

Prints:

33490.0
26799.0
...
41790.0
48000.0
于 2015-04-18T22:56:20.867 回答
1

Try this:

import requests
from bs4 import BeautifulSoup

res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&')
soup = BeautifulSoup(res.text)
price_divs = soup.find_all('div', {'class': 'product-price'})

for price_div in price_divs:
    child_div = price_div.find('div')    
    print child_div.text
于 2014-09-18T09:04:46.210 回答
1

This get's the text within that div all striped clean:

import requests
from bs4 import BeautifulSoup
res = requests.get('http://www.snapdeal.com/products/computers-laptops?sort=plrty&')
soup = BeautifulSoup(res.text)
price = soup.find_all('div', class_="product-price")

for p in price:
    soupInner = BeautifulSoup(str(p))
    print soupInner.find('div').find('div').get_text().strip()
于 2014-09-18T09:05:17.433 回答