0

我有一个如下所示的 data.frames (d) 列表:

$ 1  :'data.frame':   1 obs. of  2 variables:     
..$ index: int 2

..$ V1:因子 w/125 级别“cgtsloqasmlkjybjlo,..:”

  $ 2  :'data.frame': 1 obs. of  2 variables:  
..$ index: int 2
 ..$ V1   : Factor w/ 125 levels "ponlohlofdctlo,..:"    

等等 1000 个数据帧。我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”和其他 1000 个数据帧中出现的唯一字母的数量。我尝试了一个愚蠢的功能,但我不是专家,所以它也是错误的,因为它不起作用:

无论如何我试图分裂(但它不起作用..):

 chars = sapply(d, function(x) strsplit(as.character(d),"")) 

上瘾时,我必须计算“cgtsloqasmlkjybjlo,..:”以及“ponlohlofdctlo,..:”和其他 1000 个中“lo”的出现次数。

编辑:所需的输出将是一个data.frame:

        Seq           length(unique_letters)   lo_occurrences
 cgtsloqasmlkjybjlo           13                       2      
   ponlohlofdctlo             9                        3     
   ..............           ............         ............    


 dput output: 
  dput(d[1:3])

结构(列表(1=结构(1000L,.Label = c(“jhgfilsouilohgucaksfiaaknajdauloadbayrzjdhad”,“fjkhqurtglowqgbdahhmolovdethabvfdalo”,“....”,“V1”),类=“因子”)),.Names = c(“1” , "2", "3"))

4

1 回答 1

1

一种方法是这样的:

#simulating your list; I got an error trying to use your dput
d <- list(data.frame(index = 2, V1 = "cgtsloqasmlkjybjlo"), 
      data.frame(index = 2, V1 = "ponlohlofdctlo"))
d
#[[1]]
#  index                 V1
#1     2 cgtsloqasmlkjybjlo

#[[2]]
#  index             V1
#1     2 ponlohlofdctlo

res <- do.call(rbind, lapply(d, function(x) data.frame(seq = as.character(x$V1), 
       length_uniques = length(unique(unlist(strsplit(as.character(x$V1), "")))), 
               lo_counts = length(unlist(gregexpr("lo", as.character(x$V1)))))))
res
#                 seq length_uniques lo_counts
#1 cgtsloqasmlkjybjlo             13         2
#2     ponlohlofdctlo              9         3
于 2013-11-04T22:58:12.063 回答