数据构建:
dat <- list( seq1 =c( "A", "B", "C","D","C","A", "C","D","A","A","B","D"),
seq2 =c( "C","C","B","A","D","D","A","B","C","D","B","A","D"),
seq3 = c("D","A","D","A","C","C","B","A","D","C","D","A"))
这将为您提供一阶转换计数:
lapply( dat, function(s) table( s, # start
c(s[-1],NA) # next
) ) )
#look at matrix( c( s, c(s[-1],NA) ), ncol=2) to verify
$seq1
s A B C D
A 1 2 1 0
B 0 0 1 1
C 1 0 0 2
D 1 0 1 0
$seq2
s A B C D
A 0 1 0 2
B 2 0 1 0
C 0 1 1 1
D 1 1 0 1
$seq3
s A B C D
A 0 0 1 2
B 1 0 0 0
C 0 1 1 1
D 3 0 1 0
这将在没有平均的情况下累积这些计数:
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
s A B C D
A 1 3 2 4
B 3 0 2 1
C 1 2 2 4
D 5 1 2 1
这可能是从该结果中获取转换矩阵的一种方法:
prop.table(
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
, 1) # specifies row-proportions
s A B C D
A 0.1000000 0.3000000 0.2000000 0.4000000
B 0.5000000 0.0000000 0.3333333 0.1666667
C 0.1111111 0.2222222 0.2222222 0.4444444
D 0.5555556 0.1111111 0.2222222 0.1111111
这是新策略:
newdat <- do.call('rbind', lapply(lapply( dat, function(s) table( s,
c(s[-1],NA)
) ) , as.data.frame))
str(newdat)
'data.frame': 41 obs. of 3 variables:
$ s : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
$ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ...
$ Freq: int 1 0 1 1 2 0 0 0 1 1 ...
使用 newdat-object 可以简单地使用 xtabs 对s
和Var2
特性进行制表以获得总和:
> xtabs( Freq ~ s + Var2, newdat)
Var2
s A B C D
A 1 3 1 6
B 3 1 2 1
C 1 1 1 3
D 6 2 1 1
然后重做prop.table
- 操作以获得行比例。
prop.table(xtabs( Freq ~ s + Var2, newdat), 1)
#---------
Var2
s A B C D
A 0.09090909 0.27272727 0.09090909 0.54545455
B 0.42857143 0.14285714 0.28571429 0.14285714
C 0.16666667 0.16666667 0.16666667 0.50000000
D 0.60000000 0.20000000 0.10000000 0.10000000