1

我对在 R 中使用 selenium 很感兴趣。我注意到这里描述了各种文档WebDriver (Selenium 2) API documentation。有没有用 R 完成任何工作。我将如何处理这个问题。在文档中,它记录了有关运行 selenium 服务器的说明,并且可以使用 Javascript 查询 api。任何帮助将非常感激。

4

4 回答 4

1

Selenium 可以使用JsonWireProtocol访问。

首先通过以下方式从命令行启动 Selenium 服务器:

java -jar selenium-server-standalone-2.25.0.jar

一个新的火狐浏览器可以打开如下:

library(RCurl)
library(RJSONIO)
library(XML)

baseURL<-"http://localhost:4444/wd/hub/"
server<-list(desiredCapabilities=list(browserName='firefox',javascriptEnabled=TRUE))

getURL(paste0(baseURL,"session"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(server))

serverDetails<-fromJSON(rawToChar(getURLContent('http://localhost:4444/wd/hub/sessions',binary=TRUE)))
serverId<-serverDetails$value[[1]]$id

导航到谷歌。

getURL(paste0(baseURL,"session/",serverId,"/url"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(list(url="http://www.google.com")))

获取搜索框的id

elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(list(using="xpath",value="//*[@id=\"gbqfq\"]")),binary=TRUE))
       )

elementId<-elementDetails$value

搜索主题

rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element/",elementId,"/value"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(list(value=list("\uE009","a","\uE009",'\b','Selenium api in R')))
       ,binary=TRUE))

返回搜索 html

googData<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/source"),
       customrequest="GET",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       binary=TRUE
       ))
       )

获取建议的链接

gxml<-htmlParse(googData$value)
urls<-unname(xpathSApply(gxml,"//*[@class='l']/@href"))

关闭会话

getURL(paste0(baseURL,"session/",serverId),
       customrequest="DELETE",
       httpheader=c('Content-Type'='application/json;charset=UTF-8')
       )
于 2012-09-14T17:14:33.560 回答
0

我想废弃每一轮的足球比赛桌,但不知道该怎么做,如果你愿意为我遮光,我会很感激...... 使用 R 连接 Selenium-Server-Standalone

elementDetails<-fromJSON(rawToChar(getURLContent(paste0(baseURL,"session/",serverId,"/element"),
   customrequest="POST",
   httpheader=c('Content-Type'='application/json;charset=UTF-8'),
   postfields=toJSON(list(using="xpath",value="//*[@id="Match_Table"]")),binary=TRUE))
   )

elementId<-elementDetails$value
于 2013-10-11T15:25:00.843 回答
0

最近开发了 relenium 包(Selenium for R),通过 rJava 包导入selenium。它主要用于网页抓取。免责声明:我是开发人员之一。

于 2013-11-25T22:52:35.043 回答
0

现在,R 包 RSelenium 存在并且可以按如下方式使用:

library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://www.google.com/")
于 2021-12-15T14:46:57.887 回答