0

我有一个字符串向量,每个字符串都是 id 的 csv 列表。我想将每个字符串拆分为一个列表,并将长度和 id 集存储为数据框中的两个新列。这是一个例子:

df = data.frame(ids = c("a,b,c", "d", "e", "", "f,g", "", "h", "i", ""), stringsAsFactors=FALSE)
ids = sapply(df$ids, function (s) unlist(strsplit(as.character(s), ",")))
df$num.ids = sapply(ids, length)
df$ids.vec = sapply(ids, unlist)

到目前为止,这看起来不错:

> df
    ids num.ids ids.vec
1 a,b,c       3 a, b, c
2     d       1       d
3     e       1       e
4             0        
5   f,g       2    f, g
6             0        
7     h       1       h
8     i       1       i
9             0    

但是当我输入 summary(df) 时,我得到了 ids.vec 的神秘列。更重要的是,summary 不会计算摘要,而是列出每一行(当我将它应用于我的真实数据集时,这是一个问题)。

> summary(df)
      ids               num.ids  ids.vec.Length  ids.vec.Class  ids.vec.Mode
 Length:9           Min.   :0   3          -none-     character            
 Class :character   1st Qu.:0   1          -none-     character            
 Mode  :character   Median :1   1          -none-     character            
                    Mean   :1   0          -none-     character            
                    3rd Qu.:1   2          -none-     character            
                    Max.   :3   0          -none-     character            
                                1          -none-     character            
                                1          -none-     character            
                                0          -none-     character  

任何想法我做错了什么?

谢谢!凯文

4

1 回答 1

0

你没有做错任何事。正如@joran 提到的,问题实际上是您希望从 summary() 中获得什么信息?

您所看到的是两个摘要的组合:

# df1 is df less ids.vec;  df2 is only ids.vec
df1 <- df[,names(df) != "ids.vec"]
df2 <- df[,names(df) == "ids.vec"]

> summary(df1)  # summary for a data frame
     ids               num.ids 
 Length:9           Min.   :0  
 Class :character   1st Qu.:0  
 Mode  :character   Median :1  
                    Mean   :1  
                    3rd Qu.:1  
                    Max.   :3  

> summary(df2)   # summary for a list
      Length Class  Mode     
a,b,c 3      -none- character
d     1      -none- character
e     1      -none- character
      0      -none- character
f,g   2      -none- character
      0      -none- character
h     1      -none- character
i     1      -none- character
      0      -none- character

合并摘要的格式有点尴尬。

请注意,它将列表的整个摘要作为单个列

> colnames(summary(df))
[1] "    ids"                                    
[2] "   num.ids"                                 
[3] "ids.vec.Length  ids.vec.Class  ids.vec.Mode"

还要注意 df2 是一个列表。

> str(df2)
List of 9
 $ a,b,c: chr [1:3] "a" "b" "c"
 $ d    : chr "d"
 $ e    : chr "e"
 $      : chr(0) 
 $ f,g  : chr [1:2] "f" "g"
 $      : chr(0) 
 $ h    : chr "h"
 $ i    : chr "i"
 $      : chr(0)

这是原始数据框的一部分

> str(df)
'data.frame': 9 obs. of  3 variables:
 $ ids    : chr  "a,b,c" "d" "e" "" ...
 $ num.ids: int  3 1 1 0 2 0 1 1 0
 $ ids.vec:List of 9
  ..$ a,b,c: chr  "a" "b" "c"
  ..$ d    : chr "d"
  ..$ e    : chr "e"
  ..$      : chr 
  ..$ f,g  : chr  "f" "g"
  ..$      : chr 
  ..$ h    : chr "h"
  ..$ i    : chr "i"
  ..$      : chr 
于 2012-11-10T00:16:05.497 回答