0

所以我对 Python 还很陌生,我尝试使用这个 Flipkart scraper

我试图添加一个“价格”模块,但它一直给我错误“IndexError:列表索引超出范围”

我使用这个抓取工具的目标是从 Flipkart 抓取产品信息、评级、价格、规格、图片 URL 等到目前为止,这对我来说是一个具有挑战性的目标......但我认为如果我得到正确的帮助并更多地了解 python,我可以做到。

import requests
from urllib.request import urlopen as req
from bs4 import BeautifulSoup as soup

filename = "mobiles.csv"
f = open(filename, "w")
headers = "product_name, specs, rating, price\n"
f.write(headers)


for i in range(0, 200):
    url = 'https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'+'&page='+str(i)
    print(url)
    client = req(url)
    html = client.read()
    client.close()
    page_soup = soup(html, "html.parser")
    containers = page_soup.findAll("div",{"class":"col col-7-12"})
    for container in containers:
        
        
        price_container = container.findAll('div',  {"class":"_1vC4OE _2rQ-NK"})

        price = price_container[0].text

        name_container = container.findAll("div", {"class":"_3wU53n"})
        product_name = name_container[0].text
        
        rate_container = container.findAll("div", {"class":"hGSR34"})
        if(not(rate_container)):
            rating = "none"
        else:
            rating = rate_container[0].text

        specs_container = container.findAll("ul", {"class":"vFw0gD"})
        specs = specs_container[0].text

        f.write(product_name.replace(",", "|") + ","  +specs + "," +rating + "," +price + "\n")
f.close()

打印以下内容:

https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=0
Traceback (most recent call last):
  File "C:\Users\HOLES\Desktop\flipkart_web_scraper-master\flipkart_web_scraper-master\flipkart.py", line 24, in <module>
    price = price_container[0].text
IndexError: list index out of range
4

1 回答 1

0

您的代码的问题container在于以下代码:

containers = page_soup.findAll("div",{"class":"col col-7-12"})

如果您打印containers[0]并在其中搜索_1vC4OE _2rQ-NK,您将找不到任何内容。因此,您可以通过查看更广泛的问题来解决此问题<div>

containers = page_soup.findAll("div",{"class":"_1UoZlX"})
于 2020-05-20T19:54:45.570 回答