web-scraping - 使用 requests-html python 库，我如何滚动？到页尾

Question

问题是，我需要抓取数据，但整个数据仅在我滚动时生成。

如果我在滚动之前刮掉一些数据，而不是全部刮掉。

from requests_html import AsyncHTMLSession

link="https://www.daraz.com.np/catalog/?q={}"
asession = AsyncHTMLSession()
async def get_daraz():
    r = await asession.get(link.format("mouse"))
    await r.html.arender()
    return r.html
results = asession.run(get_daraz)


items_div=results[0].xpath('//*[@id="root"]/div/div[2]/div[1]/div/div[1]/div[2]/div')

for item in items_div:
    print(item.xpath('//div/div/div[1]/div/a/img',first=True))

上面只给出了前三个的图像。

score 0 · Accepted Answer

您可以查看pyautogui库来滚动网页。Selenium 也可以，但被许多网站屏蔽。

pyautogui.moveTo(200,200) # move mouse to a blank spot on the screen. (x, y) coordinates.
pyautogui.click(200,200) # click spot on screen at the coordinates of your choice.
pyautogui.scroll(100) # you could use a higher number to scroll more to load the whole page.

web-scraping - 使用 requests-html python 库，我如何滚动？到页尾

1 回答 1

Related

Reference