一年多以来,我一直在使用下面的代码来抓取某些 Kickstarter 页面,作为我日常工作的一部分。没有恶意或恶意,只需要从页面中获取一些信息来帮助项目创建者。
但是在过去的 4 - 6 个月里,Kickstarter 实施了某种阻止程序,它阻止我到达/抓取实际页面。我得到的只是Backer or bot?
Complete this security check to prove that you’re a human. Once you’ve passed this page, you might need to navigate away from your current screen on Kickstarter to refresh and move on.
To avoid seeing this page again, double-check that JavaScript and cookies are enabled on your web browser and that you’re not blocking them from loading with an extension (e.g., ad blockers).
任何人都可以想出一种方法来绕过此检查并实际登陆页面吗?任何输入都会非常有帮助。
import os
import sys
import requests
import time
import urllib
import urllib.request
import shutil
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from csv import writer
from shutil import copyfile
print('What is the project URL?')
urlInp = input()
elClass = "rte__content"
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get(urlInp)
time.sleep(2)
html = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()
soup = BeautifulSoup(html, 'lxml')
ele = soup.find('div', {'class': elClass})
print(soup)
quit()