0

你好在这个网页上http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html

我正在尝试使用 RSelenium 单击所有作为链接的玩家名称,抓取单个玩家网页返回并继续使用另一个玩家

# packages
library(RSelenium)
library(XML)


 # navigation to the site
    remDr <- remoteDriver$new()
    remDr$open()
    remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html")

 # this will find all needed links
    player<-remDr$findElements(using = 'xpath',value = "//span/a")

 # this confirms that there are 20 links
    length(player)


# this is loop which is supposed to click go to all 20 pages scrape some info and proceed
for (i in 1:20) {

    player<-remDr$findElements(using = 'xpath',value = "//span/a")
    player[[i]]$clickElement()
    Sys.sleep(5)
    urlplayer<-remDr$getCurrentUrl()
    urlplayer2<-htmlParse(urlplayer)
    hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue)
    print(hraci)
    remDr$goBack()
}

我运行此代码几次,但总是在一些迭代后得到错误Error in player[[i]] : subscript out of bounds

如果我在上次尝试中查找迭代器的值,它是 7,有时是 12 和其他数字。

我不知道为什么会收到此错误,因此感谢任何人的帮助!

4

1 回答 1

0

我建议一种不同的方法,它不需要 Selenium:

library(XML)
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8")
n <- 3
hrefs <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n )
players <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n )
for (x in seq(hrefs)) 
  download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html")) )

x <- 1
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html")))
于 2016-04-03T21:04:08.523 回答