scrapy - 飞溅没有完全呈现网页

Question

我正在尝试使用 scrapy + splash 来抓取这个网站https://www.teammitsubishihartford.com/new-inventory/index.htm?compositeType=new。但我无法从该网站提取任何数据。当我尝试使用splash api（浏览器）渲染网页时，我知道该站点没有完全加载（splash 渲染返回部分加载的网站图像）。我怎样才能完整地呈现网站？

score 0 · Accepted Answer

@Vinu Abraham，如果您的要求不是特定于scrapy + splash，您可以使用selenium。当我们尝试抓取动态站点时会出现此问题。以下是供参考的代码片段。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import re
from csv import writer

# url of the page we want to scrape
url = 'https://www.*******/drugs-all-medicines'

driver = webdriver.Chrome('./chromedriver')
driver.get(url)
time.sleep(5)

html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
all_divs = soup.find('div', {'class': 'style__container___1i8GI'})

如果您使用scrapy获得相同的解决方案，也请告诉我。

scrapy - 飞溅没有完全呈现网页

1 回答 1

Related

Reference