这就是我正在做的
import requests
from requests.adapters import HTTPAdapter
from bs4 import BeautifulSoup
HEADERS = {
'authority': 'www.noon.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'dnt': '1',
'upgrade-insecure-requests': '1',
'accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-dest': 'document'
}
response = requests.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905',headers=HEADERS,stream=True)
soup = BeautifulSoup(response.content,'lxml')
results = soup.find_all("div", {"class" : "productContainer"})
result = results[0]
print("https://www.noon.com" + result.a.get('href'))
输出
https://www.noon.com/uae-en
但预期的输出应该是'https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f'
如此处,您可以从浏览器中看到
<div class="productContainer"><a class="sc-7vj7do-0 ftlAjW" href="/uae-en/product/N35521717A/p?o=f885efe0b6534e9f" id="productBox-N35521717A"><div class="kcs0h5-0 diNcmV grid" title="Samsung Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="e3js0d-1 efqIDW"><div class="productImage" data-qa-id="productImagePLP_Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="lazyload-wrapper"><div class="puv25r-0 hfEfTS"><div class="puv25r-2 hJKuPa"><img alt="Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE " src="https://a.nooncdn.com/t_desktop-pdp-v1/v1605814225/N35521717A_1.jpg"/></div></div></div></div><div class="e3js0d-2 dqjnoR"><div class="tagContainer"></div></div></div><div class="e3js0d-6 iKEZJh"><div class="e3js0d-7 jULUCI"><div class="e3js0d-10 cyUANN"><span class="e3js0d-11 gXshOX">Samsung</span>Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE </div></div><div class="e3js0d-8 jtiosv"><div class="sc-3751lm-0 hSumnU"><div class="sc-3751lm-1 eUJkVt large"><span class="currency">AED</span><strong>819.00</strong></div><div class="sc-3751lm-2 kWnsOk"><span class="oldPrice">AED<!-- --> <!-- -->859</span></div></div></div><div class="e3js0d-9 kDpjlW"><div class="e3js0d-12 gMFqig"><div class="u8zs36-0 kRPdZJ"><img alt="noon-express" height="20px" src="https://a.nooncdn.com/s/app/com/noon/images/fulfilment_express-en.png" width="80px"/></div></div></div></div></div></a></div>