这些解决方案不能让您准确地到达您想要的位置,但可能足够接近您从那里开始工作。
首先,一些数据:
temp <- structure(list(Name = c("sample1", "sample2", "sample3"),
Value1 = c("ttn", "bae", "pas"),
Value2 = c("mth", "ttn.1", "kasd"),
Value3 = c("lik", "apk", "mth")),
.Names = c("Name", "Value1", "Value2", "Value3"),
class = "data.frame", row.names = c(NA, -3L))
temp
# Name Value1 Value2 Value3
# 1 sample1 ttn mth lik
# 2 sample2 bae ttn.1 apk
# 3 sample3 pas kasd mth
这些数据是“宽”的形式。用于reshape()
将其变为“长”形式。
temp1 <- reshape(temp, direction = "long",
idvar="Name", varying = 2:4, sep = "")
# Name time Value
# sample1.1 sample1 1 ttn
# sample2.1 sample2 1 bae
# sample3.1 sample3 1 pas
# sample1.2 sample1 2 mth
# sample2.2 sample2 2 ttn.1
# sample3.2 sample3 2 kasd
# sample1.3 sample1 3 lik
# sample2.3 sample2 3 apk
# sample3.3 sample3 3 mth
现在,使用aggregate()
base R 或dcast()
“reshape2”包中的“值”值进行聚合。
aggregate(Name ~ Value, temp1, c)
# Value Name
# 1 apk sample2
# 2 bae sample2
# 3 kasd sample3
# 4 lik sample1
# 5 mth sample1, sample3
# 6 pas sample3
# 7 ttn sample1
# 8 ttn.1 sample2
require(reshape2)
dcast(temp1, Value ~ Name, value.var = "Value")
# Value sample1 sample2 sample3
# 1 apk <NA> apk <NA>
# 2 bae <NA> bae <NA>
# 3 kasd <NA> <NA> kasd
# 4 lik lik <NA> <NA>
# 5 mth mth <NA> mth
# 6 pas <NA> <NA> pas
# 7 ttn ttn <NA> <NA>
# 8 ttn.1 <NA> ttn.1 <NA>
您还提到您想“计算频率”,在这种情况下,table()
也可能是合适的:
table(temp1$Value, temp1$Name)
#
# sample1 sample2 sample3
# apk 0 1 0
# bae 0 1 0
# kasd 0 0 1
# lik 1 0 0
# mth 1 0 1
# pas 0 0 1
# ttn 1 0 0
# ttn.1 0 1 0