2

我正进入(状态

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range

每当我尝试运行此代码时都会出错。我只是想打印此页面上的所有 URL。请有人告诉我,我做错了什么?

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.tour-india.net/best-of-india.htm")
cities=browser.find_elements_by_css_selector(".posts1>a>h2")
for i in range(0,len(cities)):
    cities1=browser.find_elements_by_css_selector(".posts1>a>h2")[i]
    cities1.click()
    title=browser.find_elements_by_xpath("//title")
    content=browser.find_elements_by_css_selector(".tours_text_innerpage.content_margin_top")
    currentUrl=browser.current_url
    print currentUrl
    browser.back()

编辑:我正在对代码进行一些修改,我在 for 循环之后再次添加了city=browser.find_elements_by_css_selector(".posts1>a>h2")并且突然索引错误停止出现。现在我很困惑为什么会这样。???

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.tour-india.net/best-of-india.htm")
cities=browser.find_elements_by_css_selector(".posts1>a>h2")
for i in range(0,len(cities)):
    cities=browser.find_elements_by_css_selector(".posts1>a>h2")
    cities1=browser.find_elements_by_css_selector(".posts1>a>h2")[i]
    cities1.click()
    title=browser.find_elements_by_xpath("//title")
    content=browser.find_elements_by_css_selector(".tours_text_innerpage.content_margin_top")
    currentUrl=browser.current_url
    print currentUrl
    browser.back()

编辑:我的整个追溯

>>> import traceback
>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> browser.get("http://www.tour-india.net/best-of-india.htm")
>>> cities=browser.find_elements_by_css_selector(".posts1>a>h2")
>>> for i in range(0,len(cities)):      
...     try:
...             #cities=browser.find_elements_by_css_selector(".posts1>a>h2")
...             cities1=browser.find_elements_by_css_selector(".posts1>a>h2")[i]
...             cities1.click()
...             title=browser.find_elements_by_xpath("//title")
...             content=browser.find_elements_by_css_selector(".tours_text_innerpage.content_margin_top")
...             currentUrl=browser.current_url
...             print currentUrl
...             browser.back()
...     except:
...             print traceback.format_exc()
... 
http://www.tour-india.net/golden-triangle.htm
http://www.tour-india.net/golden-triangle-varanasi.htm
http://www.tour-india.net/magnificent-rajasthan.htm
http://www.tour-india.net/northindia-rajasthan-tour.htm
http://www.tour-india.net/north_india_himalaya_tour.htm
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
IndexError: list index out of range

http://www.tour-india.net/southindia-panorma.htm
http://www.tour-india.net/classical-rajasthan-tours.htm
http://www.tour-india.net/rajasthan-forts.htm
http://www.tour-india.net/india-nepal-tour.htm
http://www.tour-india.net/southindia-glimpses.htm
http://www.tour-india.net/enchanting-southindia.htm
http://www.tour-india.net/shekhawati-tours.htm
http://www.tour-india.net/delhi-tour.htm
http://www.tour-india.net/bombay-goa.htm
http://www.tour-india.net/royal-rajasthan.htm
http://www.tour-india.net/grand-mughal.htm
http://www.tour-india.net/north_india_himalaya_tour.htm
http://www.tour-india.net/northindia-images.htm
http://www.tour-india.net/karnataka-heritage.htm
http://www.tour-india.net/leh-ladakh.htm
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
IndexError: list index out of range

http://www.tour-india.net/darjeeling-sikkim.htm
http://www.tour-india.net/himalayan-heritage.htm
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
IndexError: list index out of range

http://www.tour-india.net/rajasthan-goa.htm
http://www.tour-india.net/rajasthan-forts-palaces.htm
http://www.tour-india.net/rajasthan-mp.htm
http://www.tour-india.net/rajasthan-nepal.htm
http://www.tour-india.net/splendid-gujarat.htm
4

4 回答 4

1

城市解决问题后再次调用城市变量。我仍然不知道为什么。但它工作正常。因为没有人发布答案。接受我自己的答案

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.tour-india.net/best-of-india.htm")
cities=browser.find_elements_by_css_selector(".posts1>a>h2")
for i in range(0,len(cities)):
    cities=browser.find_elements_by_css_selector(".posts1>a>h2")
    cities1=browser.find_elements_by_css_selector(".posts1>a>h2")[i]
    cities1.click()
    title=browser.find_elements_by_xpath("//title")
    content=browser.find_elements_by_css_selector(".tours_text_innerpage.content_margin_top")
    currentUrl=browser.current_url
    print currentUrl
    browser.back(
于 2012-09-25T10:47:41.790 回答
1

所以,你点击每个链接,打印它,然后返回?这是非常低效的。您可以使用 .get_attribute 方法非常快速地获取页面上所有链接的 url。

links = [i.get_attribute('href') for i in driver.find_elements_by_xpath('.//a')]
for i in links:
    print i

将打印页面上所有链接的列表。要选择页面的较小区域,请找到要从中选择的“框架”元素,然后使用

frame.find_elements_by_xpath('//a') 

反而。

于 2012-10-02T19:27:37.693 回答
0

使用len(cities)-1len返回比 Python 看到的列表长度多 1 的长度。

于 2012-09-24T19:50:34.163 回答
-3
for i in range(len(cities)):

Range 只接受一个参数:)

你可以修改你的循环:

for city in cities:
    city.click()
    # rest is the same 

它更“pythonic”

于 2012-09-24T18:37:01.563 回答