r - 解析数字列表的快速方法

Question

我在 CSV 文件中有一列看起来像c("","1","1 1e-3")（即空格分隔）。我试图遍历所有值，sum()在至少有一个值的地方取 of 值，NA否则返回。

我的代码目前执行以下操作：

x <- c("","1","1 2 3")
x2 <- as.numeric(rep(NA,length(x)))
for (i in 1:length(x)) {
  si <- scan(text=x[[i]],quiet=TRUE)
  if (length(si) > 0)
    x2[[i]] <- sum(si)
}

我正在努力做到这一点；x实际上是来自包含几十万行的 CSV 文件的一组列，并且认为应该可以在 R 中执行此操作。

（这些是来自可逆跳转 MCMC 算法后部的细化样本，因此随着整个文件的维数变化组合多个值，我想要有用的列）。

score 3 · Accepted Answer

基于@Chase 的想法，但处理 NA 并避免为辅助函数命名：

unlist(lapply(strsplit(x, " "),
              function(v)
                if (length(v) > 0)
                  sum(as.numeric(v))
                else
                  NA
      )      )

score 2 · Accepted Answer

这似乎执行得更快一些，并且可能对您有用。

#define a helper function
f <- function(x) sum(as.numeric(x))
unlist(lapply((strsplit(x3, " ")), f))
#-----
[1] 0 1 6

这将返回零而不是 NA，但也许这对您来说不是一个交易破坏者？

让我们看看这如何扩展到更大的问题：

#set up variables
x3 <- rep(x, 1e5)
x4 <- as.numeric(rep(NA,length(x3)))
#initial approach
system.time(for (i in 1:length(x3)) {
  si <- scan(text=x3[[i]],quiet=TRUE)
  if (length(si) > 0)
    x4[[i]] <- sum(si)
})
#-----
   user  system elapsed 
   30.5     0.0    30.5 

#New approach:
system.time(unlist(lapply((strsplit(x3, " ")), f)))
#-----
   user  system elapsed 
   0.82    0.01    0.84

r - 解析数字列表的快速方法

2 回答 2

Related

Reference