0

这就是我正在做的

import requests
from requests.adapters import HTTPAdapter
from bs4 import BeautifulSoup

HEADERS = {
    'authority': 'www.noon.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'accept': '*/*',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-dest': 'document'
}

response = requests.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905',headers=HEADERS,stream=True)
soup = BeautifulSoup(response.content,'lxml')
results = soup.find_all("div", {"class" : "productContainer"})
result = results[0]

print("https://www.noon.com" + result.a.get('href'))

输出

https://www.noon.com/uae-en

但预期的输出应该是'https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f'

如此处,您可以从浏览器中看到

<div class="productContainer"><a class="sc-7vj7do-0 ftlAjW" href="/uae-en/product/N35521717A/p?o=f885efe0b6534e9f" id="productBox-N35521717A"><div class="kcs0h5-0 diNcmV grid" title="Samsung Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="e3js0d-1 efqIDW"><div class="productImage" data-qa-id="productImagePLP_Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="lazyload-wrapper"><div class="puv25r-0 hfEfTS"><div class="puv25r-2 hJKuPa"><img alt="Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE " src="https://a.nooncdn.com/t_desktop-pdp-v1/v1605814225/N35521717A_1.jpg"/></div></div></div></div><div class="e3js0d-2 dqjnoR"><div class="tagContainer"></div></div></div><div class="e3js0d-6 iKEZJh"><div class="e3js0d-7 jULUCI"><div class="e3js0d-10 cyUANN"><span class="e3js0d-11 gXshOX">Samsung</span>Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE </div></div><div class="e3js0d-8 jtiosv"><div class="sc-3751lm-0 hSumnU"><div class="sc-3751lm-1 eUJkVt large"><span class="currency">AED</span><strong>819.00</strong></div><div class="sc-3751lm-2 kWnsOk"><span class="oldPrice">AED<!-- --> <!-- -->859</span></div></div></div><div class="e3js0d-9 kDpjlW"><div class="e3js0d-12 gMFqig"><div class="u8zs36-0 kRPdZJ"><img alt="noon-express" height="20px" src="https://a.nooncdn.com/s/app/com/noon/images/fulfilment_express-en.png" width="80px"/></div></div></div></div></div></a></div>
4

1 回答 1

0

会发生什么以及重现的步骤

网站似乎处理动态生成的内容。

  1. 在浏览器中打开网站

  2. 开源代码ctrl + u搜索class="productContainer",你会看到href<a>包含/uae-en-> 这就是你通过使用得到的requests

  3. 打开检查器ctrl+shift+i并检查你的<a>,你会发现动态添加的部分,如果你使用selenium会得到什么。

最小的例子

import time 
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
actions = ActionChains(browser)

browser.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905')

time.sleep(3)
element = browser.find_element_by_xpath("//div[contains(@class, 'productContainer')]/a")

actions.move_to_element(element).perform()
print(element.get_attribute('href'))

browser.close()

输出

https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f
https://www.noon.com/uae-en/product/N41247213A/p?o=ca38c8921770ea2a
https://www.noon.com/uae-en/product/N41247235A/p?o=c97b8bfdc0114cba
https://www.noon.com/uae-en/product/N39790555A/p?o=d7354e20a0bb00ad
https://www.noon.com/uae-en/product/N32046052A/p?o=faea2e69f38bbf6a
...

编辑

您不会requests通过抓取来源获得信息,但有另一种方法。

您可以使用 apirequests并构建链接(您可以自定义的简单示例):

import requests

url = "https://www.noon.com/_svc/catalog/api/u/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905"
headers = {
    "user-agent": "Mozilla/5.0"
}

response = requests.get(url, headers=headers)
response.raise_for_status()

records = response.json()["hits"]

for record in records:
    offer_code = record["offer_code"]
    sku = record["sku"]
    url = record["url"]
    print(f"https://www.noon.com/uae-en/{url}/{sku}/p?o={offer_code}")

输出

https://www.noon.com/uae-en/galaxy-m31-dual-sim-blue-6gb-ram-128gb-4g-lte/N35521717A/p?o=f885efe0b6534e9f
https://www.noon.com/uae-en/iphone-12-pro-max-with-facetime-128gb-pacific-blue-5g-international-specs/N41247213A/p?o=ca38c8921770ea2a
https://www.noon.com/uae-en/iphone-12-pro-with-facetime-256gb-pacific-blue-5g-international-specs/N41247235A/p?o=cfab59c09cab747b
...
于 2021-01-06T09:58:34.167 回答