1

I have been trying to scrape this website using json as described in the top answer of this post.

The specific code i'm trying to scrape is the following HTML code:

<div data-v-5005fea4 class="c-product-tile">

which contains code like:

<span data-v119c4757 class="major-price inline-block">25</span>

where i want to extract the 25 value.

My problem is that whether i use json or Beautifulsoup it returns 'None' and i don't know where to go from here


As I mentioned in a comment, this data is loaded dynamically by Javascript; thus you cannot get it by static website fetching means.

You can install Selenium: https://selenium-python.readthedocs.io/installation.html

Then you would do this:

from selenium import webdriver
from bs4 import BeautifulSoup as bs

driver = webdriver.firefox()
driver.get("https://butik.mad.coop.dk/frugt-og-groent/groentsager/avocado/c-7")   

soup = bs(driver.page_source,'lxml')

Then you'll see the data you're after:

In [17]: soup.find_all('span',{'class':"major-price inline-block"})
Out[17]: 
[<span class="major-price inline-block" data-v-119c4757="">25</span>,
 <span class="major-price inline-block" data-v-119c4757="">19</span>,
 <span class="major-price inline-block" data-v-119c4757="">23</span>,
 <span class="major-price inline-block" data-v-119c4757="">29</span>,
 <span class="major-price inline-block" data-v-119c4757="">9</span>]
4

2 回答 2

1

Use this below API which returns output in json format.

import requests
res=requests.get("https://butik.mad.coop.dk/api/search/search?categories=7&lastFacet=categories&pageSize=30").json()

for item in res['products']:
    print(item['displayName'])
    print(item['salesPrice']['amount'])
    print(item['salesPrice']['major'])

Output:

Avocado
25.0
25
Økologiske Avocado
19.95
19
Spisemodne Avocado
23.95
23
Økologiske Avocado
29.5
29
Avocado
9.5
9

To get the first item value.Use this.

import requests
res=requests.get("https://butik.mad.coop.dk/api/search/search?categories=7&lastFacet=categories&pageSize=30").json()

print(res['products'][0]['salesPrice']['amount'])
print(res['products'][0]['salesPrice']['major'])
于 2020-05-05T14:00:43.390 回答
0

正如我在评论中提到的,这些数据是由 Javascript 动态加载的;因此您无法通过静态网站获取方式获取它。

您可以安装 Selenium:https ://selenium-python.readthedocs.io/installation.html

然后你会这样做:

from selenium import webdriver
from bs4 import BeautifulSoup as bs

driver = webdriver.firefox()
driver.get("https://butik.mad.coop.dk/frugt-og-groent/groentsager/avocado/c-7")   

soup = bs(driver.page_source,'lxml')

然后你会看到你所追求的数据:

In [17]: soup.find_all('span',{'class':"major-price inline-block"})
Out[17]: 
[<span class="major-price inline-block" data-v-119c4757="">25</span>,
 <span class="major-price inline-block" data-v-119c4757="">19</span>,
 <span class="major-price inline-block" data-v-119c4757="">23</span>,
 <span class="major-price inline-block" data-v-119c4757="">29</span>,
 <span class="major-price inline-block" data-v-119c4757="">9</span>]
于 2020-05-05T13:40:47.873 回答