python - 使用 selenium 进行 Python 3 网页抓取：ui-dialog 切换问题

Question

我是一名学生，对 Python 很陌生。我想从网站下载 pdf 文件（这些是来自不同组织的财务报告），但在此之前我必须完成一些步骤。这是我正在处理的网站：http ://sprawozdaniaopp.mpips.gov.pl/ 这里有很多组织，所以我认为最好下载带有脚本的pdf。首先，我的脚本单击搜索按钮（没有任何条件 - 查找全部）-> 作为整个链接列表加载的效果。当我单击链接时 - > 同一站点上出现较小的窗口（此窗口仅指我单击的组织）。而且 - 这是问题所在 - 我的脚本无法切换到该窗口。我在网上搜索并找到了 driver.switch_to.window 或 driver.switch_to.frame 函数，但它不起作用或我没有正确使用它。恐怕这不是任何框架，而是 ui-dialog(?)。当我单击此窗口上的右键并检查此窗口时，我发现了类似的内容：

<div class="ui-dialog ui-widget ui-widget-content ui-corner-all" tabindex="-1" role="dialog" aria-labelledby="ui-dialog-title-2" style="display: block; z-index: 1002; outline: 0px; height: auto; width: 600px; top: 234.5px; left: 328px;"><div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix"><span class="ui-dialog-title" id="ui-dialog-title-2">Szczegółowe informacje o organizacji</span><a href="#" class="ui-dialog-titlebar-close ui-corner-all" role="button"><span class="ui-icon ui-icon-closethick">close</span></a></div><div style="width: auto; min-height: 0px; height: 401.896px;" class="ui-dialog-content ui-widget-content" scrolltop="0" scrollleft="0"> (...)

不知道如何告诉我的脚本切换到这种对话窗口（？），以使其仅在 2016 年搜索链接“Sprawozdanie merytoryczne”。

这个网站的奇怪之处在于，当我检查链接时，例如：http ://sprawozdaniaopp.mpips.gov.pl/Search/Details/0000000168只有单击左键才能打开它。当我尝试在新标签中打开它时，这是不可能的（为什么？）。效果如下：“'/' 应用程序中的服务器错误。找不到资源。说明：HTTP 404。您正在查找的资源（或其依赖项之一）可能已被删除、更改名称或暂时不可用。请检查以下 URL 并确保拼写正确。"

这是我在 Python 中的脚本：

import urllib
import urllib.request
import requests
import re

url = "http://sprawozdaniaopp.mpips.gov.pl/Search/Print/13313?reporttypeId=13"


r = requests.get(url)
#with open(r'C:\Users\username\Desktop\financialreport1.pdf', 'wb') as f:
#       f.write(r.content)

from selenium import webdriver

chrome_path= r"C:\Users\username\AppData\Local\Programs\Python\Python35-32\Scripts\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://sprawozdaniaopp.mpips.gov.pl/")

#Button Search called here in polish "Znajdź"
elem = driver.find_element_by_xpath("//*[@id='btnsearch']/span") 
elem.click()

#testing if I'm able to find links on this website 
#elems = driver.find_elements_by_xpath("//a[@href]")
#for elem in elems:
    #print (elem.get_attribute("href"))

#Clicking on first link ( in future I wanted to do it in loop for every link
#elem1 = driver.find_element_by_xpath("//*[@id='form1']/div/div[4]/table/tbody/tr[1]/td[3]/a")
elem1 = driver.find_element_by_css_selector("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")
elem1.click()

#doesn't work
#driver.switch_to.window("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")

#below doesn't work because I can't switch to window where elem2 is placed
elem2 = driver.find_element_by_css_selector("body > div.ui-dialog.ui-widget.ui-widget-content.ui-corner-all > div.ui-dialog-content.ui-widget-content > table:nth-child(4) > tbody > tr:nth-child(7) > td:nth-child(1) > a")
elem2.click()

我附上一些屏幕来说明我的问题。我将非常感谢任何我应该寻找的建议或一些关键词（也许情况很明显，我作为新手并不理解）。问候！

单击黄色链接后在新选项卡中打开的组织的部分列表需要 pdf 文件

score 1 · Accepted Answer

在网站上http://sprawozdaniaopp.mpips.gov.pl/单击Search按钮并单击第一个链接后，我们需要等待Modal Box打开，然后我们必须单击该Sprawozdanie merytoryczne链接。这是您自己的代码，经过简单的调整，如下所示：

elem1 = driver.find_element_by_css_selector("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")
elem1.click()
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR,".ui-dialog.ui-widget.ui-widget-content.ui-corner-all")))
driver.find_element_by_link_text("Sprawozdanie merytoryczne").click()

python - 使用 selenium 进行 Python 3 网页抓取：ui-dialog 切换问题

1 回答 1

Related

Reference