我使用这段代码将 web pdf 转换为 csv 文件,到目前为止效果很好:
library(tabulizer)
#Read
lst <- extract_tables(file = 'https://www.stoxx.com/document/Reports/SelectionList/2020/November/sl_sxebmp_202011.pdf')
#Format
#Split elements as first element has variable names
d1 <- lst[[1]]
lst2 <- lst[2:length(lst)]
#Process
#Format first element
d1 <- as.data.frame(d1,stringsAsFactors = F)
names(d1) <- d1[1,]
d1 <- d1[2:dim(d1)[1],]
#Format list
lst2 <- lapply(lst2,function(x) {x <- as.data.frame(x,stringsAsFactors=F)})
#Bind all element in lst2
d2 <- do.call(rbind,lst2)
#Assign same names
names(d2) <- names(d1)
#Bind all
d3 <- rbind(d1,d2)
write.csv(d3, file = "C:/Users/m3254/OneDrive/Bolsa/Stoxx/STOXX_All_Europe_800_202011.csv")
我今天收到这些错误消息:
> library(tabulizer)
Error: package or namespace load failed for ‘tabulizer’:
.onLoad failed in loadNamespace() for 'tabulizerjars', details:
call: NULL
error: .onLoad failed in loadNamespace() for 'rJava', details:
call: fun(libname, pkgname)
error: JAVA_HOME cannot be determined from the Registry
> #Read
> lst <- extract_tables(file = 'https://www.stoxx.com/document/Reports/SelectionList/2020/November/sl_sxebmp_202011.pdf')
Error in extract_tables(file = "https://www.stoxx.com/document/Reports/SelectionList/2020/November/sl_sxebmp_202011.pdf") :
no se pudo encontrar la función "extract_tables"
有人可以帮我吗?