因此,我正在研究(C++)中的一个项目,在该项目中,我必须计算从文本文件中读取的 DNA 序列中核苷酸的概率。我已经弄清楚了有关文件的其他信息,例如序列的平均长度、方差、偏差等。
例子...
"atgatatgagc"
我可以给出一个可能的'a'
弹出窗口或't
' .. 等
任何提示或建议?
因此,我正在研究(C++)中的一个项目,在该项目中,我必须计算从文本文件中读取的 DNA 序列中核苷酸的概率。我已经弄清楚了有关文件的其他信息,例如序列的平均长度、方差、偏差等。
例子...
"atgatatgagc"
我可以给出一个可能的'a'
弹出窗口或't
' .. 等
任何提示或建议?
char letter='a';
string str="abcd";
cout << (double) std::count(str.begin(), str.end(), letter) / str.size();
在没有更多信息的情况下,并假设每个字母的概率相等,那么任何字母“弹出”的概率是 1/4,假设有四个可能的字母、A
、和。T
G
C
Leonid Volnitsky 代码的轻微修改:
#include <iostream>
#include <algorithm>
#include <string>
using namespace std ;
int main(void)
{
char character_A='A';
char character_C='C';
char character_G='G';
char character_T='T';
string DNA_Sequence="ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA";
int occurrences_A=std::count(DNA_Sequence.begin(), DNA_Sequence.end(), character_A);
double probability_A =(double) occurrences_A/ DNA_Sequence.size();
int occurrences_C=std::count(DNA_Sequence.begin(), DNA_Sequence.end(), character_C);
double probability_C =(double) occurrences_C/ DNA_Sequence.size();
int occurrences_G=std::count(DNA_Sequence.begin(), DNA_Sequence.end(), character_G);
double probability_G =(double) occurrences_G/ DNA_Sequence.size();
int occurrences_T=std::count(DNA_Sequence.begin(), DNA_Sequence.end(), character_T);
double probability_T =(double) occurrences_T/ DNA_Sequence.size();
cout<<"In the DNA sequence \n\n["<<DNA_Sequence <<"] \n\n\n" ;
cout<<"The probability of ["<<character_A <<"] in the sequence = "<<probability_A <<" ("<<probability_A*100 <<"%) ("<<occurrences_A<<" A's) \n" ;
cout<<"The probability of ["<<character_C <<"] in the sequence = "<<probability_C <<" ("<<probability_C*100 <<"%) ("<<occurrences_C<<" C's) \n" ;
cout<<"The probability of ["<<character_G <<"] in the sequence = "<<probability_G <<" ("<<probability_G*100 <<"%) ("<<occurrences_G<<" G's) \n" ;
cout<<"The probability of ["<<character_T <<"] in the sequence = "<<probability_T <<" ("<<probability_T*100 <<"%) ("<<occurrences_T<<" T's) \n\n" ;
cout<<"Cross check : "<<(probability_A*100)<<"% + "<<( probability_C*100)<<"% + "<<( probability_G*100)<<"% + "<<( probability_T*100)<<
"% = "<< (probability_A*100) + ( probability_C*100) + ( probability_G*100) + ( probability_T*100) <<"% \n";
cout<<"Sequence size = "<<DNA_Sequence.size()<<" (A + C + G + T = "<<occurrences_A+occurrences_C+occurrences_G+occurrences_T<<") \n\n";
cout<<" \nPress any key to continue\n";
cin.get();
return 0;
}
输出:
In the DNA sequence
[ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGG
TGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTT
GGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAA
GGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAA
ATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA]
The probability of [A] in the sequence = 0.214674 (21.4674%) (79 A's)
The probability of [C] in the sequence = 0.334239 (33.4239%) (123 C's)
The probability of [G] in the sequence = 0.285326 (28.5326%) (105 G's)
The probability of [T] in the sequence = 0.165761 (16.5761%) (61 T's)
Cross check : 21.4674% + 33.4239% + 28.5326% + 16.5761% = 100%
Sequence size = 368 (A + C + G + T = 368)
Press any key to continue