这是我当前有效的代码,但我最终在到达底部之前用完了内存:
screen_height = driver.execute_script("return window.screen.height;") # get the screen height of the web
i = 1
links_list = []
while True:
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
try:
scroll_height = driver.execute_script("return document.body.scrollHeight;")
except WebDriverException:
break
else:
pass
#driver.execute_script("window.scrollTo(0, {screen_height}*{i} + {extrascroll});".format(screen_height=screen_height, extrascroll=extrascroll, i=i))
time.sleep(scroll_pause_time)
try:
links = driver.find_elements(By.CSS_SELECTOR, "a[class='styles__StyledLink-sc-l6elh8-0 ekTmzq Asset--anchor']")
except WebDriverException:
break
for link in links:
print(link.get_attribute("href"))
links_list.append(link.get_attribute("href"))
#file.write(link.get_attribute("href") + '\n')
# Break the loop when the height we need to scroll to is larger than the total scroll height
#if i == 15:
# break
if (screen_height) * i > scroll_height:
break
#remove duplicate links
links_list = list(dict.fromkeys(links_list))
for link in links_list:
file2.write(link + '\n')
file2.close()
从代码中可以看出,由于 Opensea 在您向下浏览时动态加载项目,因此我必须在向下浏览页面时不断抓取链接。
我试图从中抓取的网站是website。有没有其他更好的方法可以从这个网站上抓取?我还是编码和硒的新手,所以我不知道一大堆,我也愿意尝试其他可能有帮助的网络抓取解决方案,我现在已经没有想法了。
如果您需要更多信息,请在评论中告诉我。(PS 我知道 Opensea 有一个 API,但他们没有为我需要的列表类型创建一个 API)