0

假设我有三个序列:

dat <- list( Seq1 =c("A", "B", "C", "D", "C", "A", "C","D","A","A","B","D"),
             Seq2 = c("C" ,"C" ,"B" ,"A" ,"D" ,"D" ,"A" ,"B","C","D","B","A","D"),
             Seq3 = c("D" ,"A" ,"D" ,"A" ,"D", "B", "B", "A","D","A","D","A"))

这些序列存储在三个不同的 CSV 文件中。我想从这些数据[聚合]中计算一阶马尔可夫链。

t=matrix(nrow = length(actionsoverall),ncol = length(actionsoverall),0)

for(i in files){
y=read.csv(i)$x
yy=as.integer(y)
  for (j in 1:(length(y)-1)) {
  t[yy[j],yy[t+1]]<-t[yy[j],yy[j+1]]+1

 }
}

for (h in 1:length(actionsoverall)) {
  t[h,]<-t[h,]/sum(t[h,])

}

实际上,我想从每个文件中读取序列(即 A 到 B 从文件 1 发生 2 次,从文件 2 发生 1 次,从文件 3 发生 3 次。A 总共发生 10 次。所以,概率为 6 /10。

注意如果我计算每个文件的转换概率并将它们平均。会一样吗?

4

1 回答 1

0

数据构建:

dat <- list( seq1 =c( "A", "B", "C","D","C","A", "C","D","A","A","B","D"),
 seq2 =c( "C","C","B","A","D","D","A","B","C","D","B","A","D"),
 seq3 = c("D","A","D","A","C","C","B","A","D","C","D","A"))

这将为您提供一阶转换计数:

 lapply( dat, function(s) table( s,         # start
                                 c(s[-1],NA) # next
                                 ) ) )

#look at matrix( c( s, c(s[-1],NA) ), ncol=2) to verify

$seq1

s   A B C D
  A 1 2 1 0
  B 0 0 1 1
  C 1 0 0 2
  D 1 0 1 0

$seq2

s   A B C D
  A 0 1 0 2
  B 2 0 1 0
  C 0 1 1 1
  D 1 1 0 1

$seq3

s   A B C D
  A 0 0 1 2
  B 1 0 0 0
  C 0 1 1 1
  D 3 0 1 0

这将在没有平均的情况下累积这些计数:

 Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )

s   A B C D
  A 1 3 2 4
  B 3 0 2 1
  C 1 2 2 4
  D 5 1 2 1

这可能是从该结果中获取转换矩阵的一种方法:

prop.table( 
     Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) ) 
      , 1)  # specifies row-proportions

s           A         B         C         D
  A 0.1000000 0.3000000 0.2000000 0.4000000
  B 0.5000000 0.0000000 0.3333333 0.1666667
  C 0.1111111 0.2222222 0.2222222 0.4444444
  D 0.5555556 0.1111111 0.2222222 0.1111111

这是新策略:

newdat <- do.call('rbind', lapply(lapply( dat, function(s) table( s,         
                              c(s[-1],NA) 
                              ) ) , as.data.frame))
str(newdat)
'data.frame':   41 obs. of  3 variables:
 $ s   : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
 $ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ...
 $ Freq: int  1 0 1 1 2 0 0 0 1 1 ...

使用 newdat-object 可以简单地使用 xtabs 对sVar2特性进行制表以获得总和:

>  xtabs( Freq ~ s + Var2, newdat)
   Var2
s   A B C D
  A 1 3 1 6
  B 3 1 2 1
  C 1 1 1 3
  D 6 2 1 1

然后重做prop.table- 操作以获得行比例。

prop.table(xtabs( Freq ~ s + Var2, newdat), 1)
#---------
   Var2
s            A          B          C          D
  A 0.09090909 0.27272727 0.09090909 0.54545455
  B 0.42857143 0.14285714 0.28571429 0.14285714
  C 0.16666667 0.16666667 0.16666667 0.50000000
  D 0.60000000 0.20000000 0.10000000 0.10000000
于 2018-04-06T18:41:32.620 回答