I'm doing webscraping to this web:
http://www.falabella.com.pe/falabella-pe/category/cat40536/Climatizacion?navAction=push
I just need the information from the products: "brand", "name of product", "price".
I can get that, but also i get the information from a banner with similar products by other users. I don't need it.
But when i go to the source code of the page, i can't see those products. I think it's been pulled through javascript or something:
QUESTION 1: How to block this information when doing the web scraping? This adds products that i don't need. But can't see this part in the source code.
QUESTION 2: When extracting prices "precio1", i get this as first element: "\n\t\t\t\tSubtotal InternetS/. 0"
I can't see that in the code source neither. How to not scrape it?
library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
#navigate to your page
remDr$navigate("http://www.falabella.com.pe/falabella-pe/category/cat40536/Climatizacion?navAction=push")
page_source<-remDr$getPageSource()
Climatizacion_marcas1 <- html(page_source[[1]])%>%
html_nodes(".marca") %>%
html_nodes("a") %>%
html_attr("title")
Climatizacion_producto1 <- html(page_source[[1]])%>%
html_nodes(".detalle") %>%
html_nodes("a") %>%
html_attr("title")
Climatizacion_precio1 <- html(page_source[[1]])%>%
html_nodes(".precio1") %>%
html_text()