python - selenium.common.exceptions.InvalidArgumentException：消息：在遍历 url 列表并作为参数传递给 get() 时参数无效

Question

我正在抓取一个页面以获取 URL，然后使用它们来抓取一堆信息。我想避免一直复制和粘贴，但我找不到如何让 get() 与对象一起工作。我的代码的第一部分运行良好，但是当我到达尝试获取 url 的部分时，我收到以下错误消息：

Traceback (most recent call last):
  File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
    driver4.get(urlworks2) 
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

这是部分代码

  #this part works well    
    for number, item in enumerate(imgs2, 1):
            # print('---', number, '---')
        
            img_url = item.get_attribute("href")
            if not img_url:
                print("none")
            else:
                print('"'+img_url+'",')
        
  # the error happens on driver4.get(urlworks2)        
        for i in range(0,30):
            urlworks = img_url[i]
            urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
            driver4 = webdriver.Chrome()
            driver4.get(urlworks2) 
            def check_exists_by_xpath(xpath):
                try:
                    WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
                except TimeoutException:
                    return False
                return True
            
            imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))                                                                                                                 
            for number, item in enumerate(imgsrc2, 1):
                # print('---', number, '---')
                artisturls = item.get_attribute("href")
                if not artisturls:
                    print("none")
                else:
                    print('"'+artisturls+'",')

score 0 · Accepted Answer

此错误消息...

Traceback (most recent call last):
  .
    driver4.get(urlworks2) 
  .
    self.execute(Command.GET, {'url': url})
  .
    self.error_handler.check_response(response)
  .
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

...暗示url作为参数传递给get()的参数是无效的。

深潜

With 在第一个for循环中item.get_attribute("href")返回一个 url 字符串并img_url在每次迭代时更新。所以实际上img_url仍然是一个字符串，但不是您假设的 url 列表。结果，在第二个for循环中，当您尝试迭代字符串的元素并将它们传递给get()您时，您会看到错误InvalidArgumentException: Message: invalid argument。

恶魔开始

例如下面的代码行：

img_url = 'https://www.google.com/'
for i in range(0,5):
    urlworks = img_url[i]
    urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
    print(urlworks2)

印刷：

h
t
t
p
s

解决方案

在全局范围内声明一个空列表img_url并继续将href附加到列表中，以便您以后可以迭代列表。

img_url = []
for number, item in enumerate(imgs2, 1):
    img_url.append(item.get_attribute("href"))

参考

您可以在以下位置找到一些相关的详细讨论：

selenium.common.exceptions.InvalidArgumentException：消息：使用 Selenium Python 从文本文件读取的 URL 调用 get() 时参数无效错误

python - selenium.common.exceptions.InvalidArgumentException：消息：在遍历 url 列表并作为参数传递给 get() 时参数无效

1 回答 1

深潜

恶魔开始

解决方案

参考

Related

Reference