VowpalWabbit 将来自 (CS)OAA 模型的原始预测写入如下一系列行:
1:-2.31425 2:-3.98557 3:-3.97967 4:-2.63708 5:-3.18749 6:-2.43984 7:-4.99018 8:-3.49138 9:-3.07816 10:-6.15126 11:-6.01152 12:-5.76039 13:-5.13096 14:-5.18472 15:-5.37358 16:-5.24147 17:-5.21512 18:-5.67961 19:-4.62929 20:-4.61404 000db8cd6aef4e5fa459126d36e0fa1f-none
1:-2.65864 2:-3.33924 3:-2.8116 4:-1.83108 5:-2.05677 6:-1.29879 7:-6.7446 8:-3.05036 9:-2.82138 10:-5.19605 11:-4.5119 12:-5.28309 13:-4.35789 14:-4.76992 15:-4.16866 16:-4.6897 17:-3.76224 18:-4.13129 19:-4.4489 20:-4.32605 000e0e58a4cb4a218bbc6cae0b1af201-none
我如何将其读入R
?
这是我的代码:
## load raw vw (CS)OAA scores
read.vw.oaa.scores <- function (myfile) {
v <- sapply(strsplit(readLines(myfile),' ',fixed=TRUE), function (r) {
m <- matrix(unlist(strsplit(head(r,-1),':',fixed=TRUE)),ncol=2,byrow=TRUE)
stopifnot(identical(1:nrow(m),as.integer(m[,1])))
c(tail(r,1),m[,2])
})
f <- as.data.frame(t(v),stringsAsFactors=FALSE)
names(f) <- c("id",head(names(f),-1))
for (n in tail(names(f),-1))
f[[n]] <- as.numeric(f[[n]])
f
}
是否有任何明显的错误/效率低下?有没有更好的办法?
PS。这种数据格式看起来像CRS,但它不是。