3

我正在尝试从 IEEE Xplore 搜索中提取搜索结果计数,给定使用 selenium webdriver 的搜索结果 URL。我没有从下面的代码中得到任何错误,但我不确定如何从这里开始。

网站感兴趣的元素: 网站兴趣元素

元素检查结果: 元素检查结果

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
chrome_driver_path = '\\xxxx\chromedriver.exe'
driver.get(url)
wait.until(presence_of_element_located((By.CLASS_NAME, "strong")))
#result = driver.??????
print(result)
driver.close()
4

2 回答 2

1

要打印搜索结果的数量, 184您可以使用以下任一定位器策略

  • 使用css_selectorget_attribute("innerHTML")

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(driver.find_element(By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2) ").get_attribute("innerHTML"))
    
  • 使用xpath文本属性:

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(driver.find_element(By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]").text)
    

理想情况下,您需要诱导WebDriverWait并且visibility_of_element_located()您可以使用以下任一Locator Strategies

  • 使用CSS_SELECTOR文本属性:

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.Dashboard-header span span:nth-of-type(2)"))).text)
    
  • 使用XPATHget_attribute("innerHTML")

    driver.get("https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'Dashboard-header')]//span//following::span[2]"))).get_attribute("innerHTML"))
    
  • 控制台输出:

    184
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

您可以在如何使用 Selenium - Python 检索 WebElement 的文本中找到相关讨论


参考

链接到有用的文档:

于 2021-01-25T22:49:48.853 回答
0

正如 dukkee 提到的检查 api,但要回答您的问题,您可以选择它:

soup.select('div.Dashboard-header.col-12 > span span')[1].get_text()

找到div具有唯一性的父级,class然后转到span.

例子

from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = 'https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=web%20scraping'
driver = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get(url)
time.sleep(3)

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
print(soup.select('div.Dashboard-header.col-12 > span span')[1].get_text())

driver.quit()
于 2021-01-25T18:22:27.103 回答