我正在 R 中对 Google Play 应用程序的评论进行网络抓取,但我无法确定对评论缺乏响应。
我解释。我打算建立一个包含两列的数据库。一个包含评论的文本,另一列包含应用程序对该评论的响应。在最后一列中,当没有响应时,它将具有空值。但是,我只得到答案,我无法确定没有答案。如何才能做到这一点?
输入
输出 我想要返回的内容
我怎么能得到这个?确定没有响应
完整代码
#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of
url <- 'https://play.google.com/store/apps/details?id=com.gospace.parenteral&showAllReviews=true'
# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "firefox")
remDr$open()
# go to website
remDr$navigate(url)
# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()
#1 column
reviews <- html_obj %>% html_nodes(".UD7Dzf") %>% html_text()
#2 column
reply <- html_obj %>% html_nodes('.LVQB0b') %>% html_text()
# create the df with all the info
review_data <- data.frame(reviews = reviews, reply = reply, stringsAsFactors = F)