r - 使用 apply 将 R 循环转换为函数形式

Question

我编写了一些 R 代码来解析字符串，计算子字符串的出现次数，然后填充子字符串计数表。它工作得很好，但是在我使用的实际数据（非常大）上真的很慢，而且我知道其中很多是因为我使用的是循环而不是 apply 系列中的函数。我一直在尝试将此代码转换为功能形式，但我没有任何运气，有人可以帮忙吗？我最大的问题是我想不出一种方法来使用列名来匹配 apply 构造中的值。这是带有一些玩具数据的代码：

#Create toy data, list of unique substrings
code_frame<-matrix(c(c('a|a|b|c|d'),c('a|b|b|c|c'),c('a|b|c|d|d')),nrow=3,ncol=1)   
all_codes_list<-c('a','b','c','d')

#create data frame with a column for each code and a row for each job
code_count<-as.data.frame(matrix(0, ncol = length(all_codes_list), nrow = nrow(code_frame)))
colnames(code_count)<-all_codes_list

#fill in the code_count data frame with entries where codes occur
for(i in 1:nrow(code_frame)){
    test_string<-strsplit(code_frame[i,1],split="|",fixed=TRUE)[[1]]
    for(j in test_string){
        for(g in 1:ncol(code_count)){
            if(j == all_codes_list[g]){
                code_count[i,g]<-code_count[i,g]+1
                }
            }
        }
    }

谢谢。

score 5 · Accepted Answer

oneliner，分为 3 行：

do.call(rbind,
        lapply(strsplit(code_frame[,1], "|", fixed=TRUE),
               function(x) table(factor(x, levels=all_codes_list))))

请注意，这strsplit是矢量化的，因此您不需要所有行的外部循环。您的内部循环基本上是计算每个代码的出现次数，这是table. 最后，do.call(rbind, *)是将行列表转换为单个数据框的标准习语。

score 4 · Accepted Answer

qdap包有一个非常适合此的工具，它应该非常快速且编码很少，称为mtabulate：

library(qdap)    
mtabulate(strsplit(code_frame, "\\|"))

##   a b c d
## 1 2 1 1 1
## 2 1 2 2 0
## 3 1 1 1 2

基本上，它采用向量列表（从输出strsplit）并为每个向量制作一行列表信息。

编辑： 如果速度真的是你的事，这里是 1000 次复制的基准（Win 7 机器上的microbenchmark 包）：

Unit: microseconds
     expr      min       lq   median       uq      max neval
   HONG()  592.458  620.448  632.111  644.706 4650.560  1000
  TYLER()  324.220  342.413  351.743  361.073 3556.613  1000
 HENRIK() 1527.329 1560.450 1578.177 1614.331 4828.297  1000

和视觉输出： 在此处输入图像描述

score 2 · Accepted Answer

另一种base选择：

df <- read.table(text = code_frame, sep = "|")

tt <- apply(df, 1, function(x){
  x2 <- factor(x, levels = letters[1:4])
  table(x2)
  })

t(tt) 

#      a b c d
# [1,] 2 1 1 1
# [2,] 1 2 2 0
# [3,] 1 1 1 2

r - 使用 apply 将 R 循环转换为函数形式

3 回答 3

Related

Reference