我正在编写代码模拟一个非常简单的马尔可夫链,以从两个转换矩阵中的任何一个生成 10000 个 6 核苷酸序列(即,如果前一个核苷酸是 A,那么使用这组概率来生成下一个核苷酸,依此类推)。我还获得了通过从所述矩阵中获取相应概率获得的运行产品,但这些产品也没有正确传播。
我知道 MATLAB 可能不会像其他语言那样处理字符串/字符,所以我不完全确定我的代码发生了什么。基本上,我正在抽样,但它似乎没有正确分配。我的代码如下。诚然,它不优雅,但它应该工作......但它没有。考虑到这些转换矩阵,有没有更简单的方法来做到这一点?(显然,到目前为止,该代码仅适用于第一个矩阵 (M1)。)
clear all;
close all;
%length of sequences
n = 6;
%nucleotide map
N = 'ACGT';
%initial probabilities
% A C G T
M0 = [0.25 0.25 0.25 0.25];
%transition probabilities (if row, then col)
%inside CpG
% A C G T
M1 = [0.18 0.274 0.426 0.12 %A
0.171 0.367 0.274 0.188 %C
0.161 0.339 0.375 0.125 %G
0.079 0.355 0.384 0.182]; %T
%outside CpG
% A C G T
M2 = [0.3 0.205 0.285 0.21 %A
0.322 0.298 0.078 0.302 %C
0.248 0.246 0.298 0.208 %G
0.177 0.239 0.292 0.292]; %T
sampletype = input('Matrix Type (M1 or M2): ', 's');
sampseq = zeros(n,1);
PM1 = 1;
PM2 = 1;
count = 0;
if strcmp(sampletype,'M1')
for i = 1:10000
for position = 1:n
if position == 1
firstnuc = randsample(N,1,true,M0);
sampseq(1) = firstnuc;
PM1 = PM1*0.25;
PM2 = PM2*0.25;
%check for previous nucleotide
elseif position ~= 1
if strcmp(sampseq(position-1),'A')
tempnuc = randsample(N,1,true,M1(1,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(1,findstr(N,tempnuc));
PM2 = PM2*M2(1,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'C')
tempnuc = randsample(N,1,true,M1(2,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(2,findstr(N,tempnuc));
PM2 = PM2*M2(2,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'G')
tempnuc = randsample(N,1,true,M1(3,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(3,findstr(N,tempnuc));
PM2 = PM2*M2(3,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'T')
tempnuc = randsample(N,1,true,M1(4,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(4,findstr(N,tempnuc));
PM2 = PM2*M2(4,findstr(N,tempnuc));
end
end
end
if PM2 > PM1
count = count + 1;
end
PM1
PM2
PM1 = 1;
PM2 = 1;
sampseq
sampseq = zeros(n,1);
end
end
count