这是处理表示的快速简便的代码。
public static enum Nucleotide {
A,B,C,D;
}
public static int setbit(int val, int pos, boolean on) {
if (on) {
// set bit
return val | (1 << (8-pos-1));
}
else {
// unset bit
return val & ~(1 << (8-pos-1));
}
}
public static int set2bits(int val, int pos, int bits) {
// set/unset the first bit
val = setbit(val, pos, (bits & 2) > 0);
// set/unset the second bit
val = setbit(val, pos+1, (bits & 1) > 0);
return val;
}
public static int setNucleotide(int sequence, int pos, Nucleotide tide) {
// set both bits based on the ordinal position in the enum
return set2bits(sequence, pos*2, tide.ordinal());
}
public static void setNucleotide(int [] sequence, int pos, Nucleotide tide) {
// figure out which element in the array to work with
int intpos = pos/4;
// figure out which of the 4 bit pairs to work with.
int bitpos = pos%4;
sequence[intpos] = setNucleotide(sequence[intpos], bitpos, tide);
}
public static Nucleotide getNucleotide(int [] sequence, int pos) {
int intpos = pos/4;
int bitpos = pos%4;
int val = sequence[intpos];
// get the bits for the requested on, and shift them
// down into the least significant bits so we can
// convert batch to the enum.
int shift = (8-(bitpos+1)*2);
int tide = (val & (3 << shift)) >> shift;
return Nucleotide.values()[tide];
}
public static void main(String args[]) {
int sequence[] = new int[4];
setNucleotide(sequence, 4, Nucleotide.C);
System.out.println(getNucleotide(sequence, 4));
}
显然,发生了很多位移,但少量的评论应该对正在发生的事情有意义。
当然,这种表示的缺点是您以 4 个为一组工作。如果要说 10 个核苷酸,则必须在计数中保留另一个变量,以便您知道序列中的最后 2 个核苷酸不是有用。
如果没有别的,可以用蛮力完成模糊匹配。您将输入一个 N 核苷酸序列,然后从 0 开始,检查核苷酸 0:N-1 并查看有多少匹配。然后你从 1:N 然后 2:N+1 等...