1

嘿,我有一个单元格数组,第二列是'XX->XX'的次数,例如:

'AA->AA'    [21]    [4.2084]
'AA->AC'    [15]    [3.0060]
'AA->AG'    [ 9]    [1.8036]
'AA->AT'    [12]    [2.4048]
'AC->CA'    [14]    [2.8056]
'AC->CC'    [16]    [3.2064]
'AC->CG'    [ 5]    [1.0020]
'AC->CT'    [ 3]    [0.6012]
'AG->GA'    [11]    [2.2044]
'AG->GC'    [ 5]    [1.0020]
'AG->GG'    [ 8]    [1.6032]
'AG->GT'    [13]    [2.6052]
'AT->TA'    [10]    [2.0040]
'AT->TC'    [ 8]    [1.6032]
'AT->TG'    [ 2]    [0.4008]
'AT->TT'    [11]    [2.2044]
'CA->AA'    [17]    [3.4068]
'CA->AC'    [ 7]    [1.4028]
'CA->AG'    [ 9]    [1.8036]
'CA->AT'    [11]    [2.2044]
'CC->CA'    [15]    [3.0060]
'CC->CC'    [ 5]    [1.0020]
'CC->CG'    [ 4]    [0.8016]
'CC->CT'    [17]    [3.4068]
'CG->GA'    [ 1]    [0.2004]
'CG->GC'    [ 2]    [0.4008]
'CG->GG'    [ 9]    [1.8036]
'CG->GT'    [ 3]    [0.6012]
'CT->TA'    [ 7]    [1.4028]
'CT->TC'    [ 9]    [1.8036]
'CT->TG'    [ 9]    [1.8036]
'CT->TT'    [ 2]    [0.4008]
'GA->AA'    [10]    [2.0040]
'GA->AC'    [ 4]    [0.8016]
'GA->AG'    [10]    [2.0040]
'GA->AT'    [ 2]    [0.4008]
'GC->CA'    [ 2]    [0.4008]
'GC->CC'    [ 7]    [1.4028]
'GC->CG'    [ 6]    [1.2024]
'GC->CT'    [ 3]    [0.6012]
'GG->GA'    [ 6]    [1.2024]
'GG->GC'    [ 6]    [1.2024]
'GG->GG'    [ 4]    [0.8016]
'GG->GT'    [ 8]    [1.6032]
'GT->TA'    [ 6]    [1.2024]
'GT->TC'    [11]    [2.2044]
'GT->TG'    [ 8]    [1.6032]
'GT->TT'    [ 5]    [1.0020]
'TA->AA'    [ 8]    [1.6032]
'TA->AC'    [13]    [2.6052]
'TA->AG'    [ 9]    [1.8036]
'TA->AT'    [ 6]    [1.2024]
'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]
'TG->GA'    [ 8]    [1.6032]
'TG->GC'    [ 5]    [1.0020]
'TG->GG'    [ 3]    [0.6012]
'TG->GT'    [ 6]    [1.2024]
'TT->TA'    [13]    [2.6052]
'TT->TC'    [ 2]    [0.4008]
'TT->TG'    [ 3]    [0.6012]
'TT->TT'    [ 5]    [1.0020]

现在,我正在尝试计算概率: P('AA->AA')=TIMES('AA->AA')/SUM('AA->AA','AA->AC','AA-> AG','AA->AT'),换句话说,P('AA->AA')=TIMES('AA->AA')/SUM('AA->Anyone')。其他人也一样。我想使用循环来做到这一点,但有一个极端情况

'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]

嗯,很明显'TC->CG'的次数是0,即使我们已经知道概率应该是0,也需要考虑。当然,这种极端情况可能发生在任何其他情况下,可能有时缺少'TT-> TT',或者有时缺少'TC-> CT'。任何人都知道如何做到这一点?谢谢。

4

1 回答 1

1

尝试这个 -

%%// Get the cell data into data1
data1 = INPUT_DATA;

%%// Get the data from columns separately
col1 = data1(:,1);
tag_data = vertcat(col1{:});

col2 = data1(:,2);
times_data = vertcat(col2{:});

col3 = data1(:,3);
col3_data = vertcat(col3{:});

%%// Get full data for tag, times and column3
char_array = ['A' 'C' 'G' 'T'];
full_tag_data = char_array(combinator(4,3,'p','r'));
full_tag_data = [full_tag_data(:,1:2) repmat('->',[size(full_tag_data,1) 1]) full_tag_data(:,2:3)];

present_rows = ismember(full_tag_data,tag_data,'rows');
full_times_data = double(present_rows);
full_times_data(present_rows) = times_data;

full_col3_data = double(present_rows);
full_col3_data(present_rows) = col3_data;

%%// Get the sum values
full_col3_data_summed = sum(reshape(full_col3_data,4,[]),1);
full_col3_data_summed = reshape(repmat(full_col3_data_summed,[4 1]),[],1);

%%// Store the required values into a cell array out_cell1
out_cell1 = cell(size(present_rows,1),2);
out_cell1(:,1) = cellstr(full_tag_data);
out_cell1(:,2) = num2cell(full_times_data);
out_cell1(:,3) = num2cell(full_col3_data);

%%// The probabilities are added into the cell array as the fourth column
out_cell1(:,4) = num2cell(full_times_data./full_col3_data_summed);

注意:上面的代码使用了一个函数combinator,可以在这里找到

于 2014-03-26T11:10:29.240 回答