I'm using Rcrawler to extract the infobox of Wikipedia pages. I have a list of musicians and I'd like to extract their name, DOB, date of death, instruments, labels, etc. Then I'd like to create a dataframe of all artists in the list as rows and the data stored as columns/vectors.
The code below throws no errors but I don't get any results either. The xpath used in the code is effective when I use rvest
on its own.
What is wrong with my code?
library(Rcrawler)
jazzlist<-c("Art Pepper","Horace Silver","Art Blakey","Philly Joe Jones")
Rcrawler(Website = "http://en.wikipedia.org/wiki/Special:Search/", no_cores = 4, no_conn = 4,
KeywordsFilter = jazzlist,
ExtractXpathPat = c("//th","//tr[(((count(preceding-sibling::*) + 1) = 5) and parent::*)]//td",
"//tr[(((count(preceding-sibling::*) + 1) = 6) and parent::*)]//td"),
PatternsNames = c("artist", "dob", "dod"),
ManyPerPattern = TRUE, MaxDepth=1 )