我正在尝试通过下载此页面 (http://cfe.cboe.com/Products/historicalVIX.aspx) 上的所有 CSV 文件来获取 VIX 期货的历史价格。这是我用来执行此操作的代码:
library(XML)
#Extract all links for url
url <- "http://cfe.cboe.com/Products/historicalVIX.aspx"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
free(doc)
#Filter out URLs ending with csv and complete the link.
links <- links[substr(links, nchar(links) - 2, nchar(links)) == "csv"]
links <- paste("http://cfe.cboe.com", links, sep="")
#Peform read.csv on each url in links, skipping the first two URLs as they are not relevant.
c <- lapply(links[-(1:2)], read.csv, header = TRUE)
我得到错误:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
经过进一步调查,我意识到这是因为某些 CSV 文件的格式不同。如果我手动加载 URL links[9]
,我会看到第一行有以下免责声明:
CFE data is compiled for the .......use of CFE data is subject to the Terms and Conditions of CBOE's Websites.
大多数其他文件(例如links[8]
和links[10]
)都很好,所以它似乎是随机插入的。是否有一些 R 魔法可以解决这个问题?
谢谢你。