所以我对 Python 还很陌生,我尝试使用这个 Flipkart scraper。
我试图添加一个“价格”模块,但它一直给我错误“IndexError:列表索引超出范围”
我使用这个抓取工具的目标是从 Flipkart 抓取产品信息、评级、价格、规格、图片 URL 等。到目前为止,这对我来说是一个具有挑战性的目标......但我认为如果我得到正确的帮助并更多地了解 python,我可以做到。
import requests
from urllib.request import urlopen as req
from bs4 import BeautifulSoup as soup
filename = "mobiles.csv"
f = open(filename, "w")
headers = "product_name, specs, rating, price\n"
f.write(headers)
for i in range(0, 200):
url = 'https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'+'&page='+str(i)
print(url)
client = req(url)
html = client.read()
client.close()
page_soup = soup(html, "html.parser")
containers = page_soup.findAll("div",{"class":"col col-7-12"})
for container in containers:
price_container = container.findAll('div', {"class":"_1vC4OE _2rQ-NK"})
price = price_container[0].text
name_container = container.findAll("div", {"class":"_3wU53n"})
product_name = name_container[0].text
rate_container = container.findAll("div", {"class":"hGSR34"})
if(not(rate_container)):
rating = "none"
else:
rating = rate_container[0].text
specs_container = container.findAll("ul", {"class":"vFw0gD"})
specs = specs_container[0].text
f.write(product_name.replace(",", "|") + "," +specs + "," +rating + "," +price + "\n")
f.close()
打印以下内容:
https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=0
Traceback (most recent call last):
File "C:\Users\HOLES\Desktop\flipkart_web_scraper-master\flipkart_web_scraper-master\flipkart.py", line 24, in <module>
price = price_container[0].text
IndexError: list index out of range