2

我想使用 selenium 进行搜索,然后在 DDG 搜索结束时单击“更多结果”按钮。

DDG 搜索在显示查询的所有结果时不再显示该按钮。

在没有按钮的情况下,我想退出 try 循环。

我将分享我现在正在尝试的内容。我之前也尝试过这两个选项:If len(button_element) > 0: button_element.click()并且我尝试过If button_element is not None: button_element.click().

我想要使​​用 Selenium 的解决方案,以便它显示浏览器,因为它有助于调试

这是我的代码,带有可重现的示例:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup

    browser = webdriver.Chrome()        
    browser.get("https://duckduckgo.com/")
    search = browser.find_element_by_name('q')
    search.send_keys("this is a search" + Keys.RETURN)
    html = browser.page_source

    try:
        button_element = browser.find_element_by_class_name('result--more__btn')

        try:
            button_element.click()
        except SystemExit:
            print("No more pages")

    except:
        pass
4

3 回答 3

1

要使用Selenium WebDriver单击搜索结果More Results末尾的按钮,您必须诱导WebDriverWait并且您可以使用以下任一Locator Strategies element_to_be_clickable()

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    from selenium.common.exceptions import TimeoutException
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://duckduckgo.com/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
    while True:
          try:
              WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
              print("Clicked on More Results button")
          except TimeoutException:
              print("No more More Results button")
              break
    driver.quit()
    
  • 控制台输出:

    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    No more More Results button
    

您可以在How to extract the text from the search results of dadduckgo using Selenium Python中找到相关讨论

于 2020-06-21T17:14:10.860 回答
1

您可以在 URL 上使用纯 HTML 版本的 DDG https://duckduckgo.com/html/?q=。这样您就可以使用纯requests/beautifulsoup方法并轻松获取所有页面:

import requests
from bs4 import BeautifulSoup


q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')

while True:
    for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
        print(t.get_text(strip=True, separator=' '))
        print(a['href'])
        print(s.get_text(strip=True, separator=' '))
        print('-' * 80)

    f = soup.select_one('.nav-link form')
    if not f:
        break

    data = {}
    for i in f.select('input'):
        if i['type']=='submit':
            continue
        data[i['name']] = i.get('value', '')

    soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')

印刷:

Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------

... and so on.
于 2020-06-21T17:01:12.277 回答
0

使用 WebDriverWait 等到有更多按钮

wait = WebDriverWait(browser, 15) # 15 seconds timeout 
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

此示例代码单击更多按钮,直到没有更多按钮用于 chrome 将firefox替换为chrome

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

browser = webdriver.Firefox()        
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)

while True:
    try:
        wait = WebDriverWait(browser, 15) # 15 seconds timeout
        wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

        button_element = browser.find_element_by_class_name('result--more__btn')
        button_element.click()
    except:
        break
于 2020-06-21T16:31:31.287 回答