4

I currently have data that is in the following format (note, this is 1 column, 4 row matrix):

aa|bb  
bb|cc|ee|ee  
cc  
cc|ee

and I want it displayed so that the column names are: aa, bb, cc, dd, and ee. And I want there to be 4 row such that each row counts the number of times each string was present in the matching row above.

ie)

aa bb cc dd ee  
 1  1  0  0  0  
 0  1  1  0  2  
 0  0  1  0  0   
 0  0  1  0  1 

Does anyone know how to do this in R? I would post my attempt, but it is just getting ugly and complicated. Any help would be much appreciated.

Thanks in advance.

4

1 回答 1

4

这是一个想法:

# (You'll use as.vector() on your matrix to get the vector x.)
x <- c("aa|bb", "bb|cc|ee|ee", "cc", "cc|ee") 

levs <- c("aa", "bb", "cc", "dd", "ee")
ll <- strsplit(x, "\\|")
t(sapply(ll, function(X) table(c(levs, X)))) - 1
#      aa bb cc dd ee
# [1,]  1  1  0  0  0
# [2,]  0  1  1  0  2
# [3,]  0  0  1  0  0
# [4,]  0  0  1  0  1

这可以澄清(至少一点)最后一行代码的作用:

table(c(levs, c("dd", "cc", "cc", "cc"))) - 1
# 
# aa bb cc dd ee 
#  0  0  3  1  0 
于 2012-10-26T23:19:27.427 回答