bioinformatics - 如何从文本文件中读取 DNA 序列并将其存储在 C 中的数组中？

Question

如何从 C 语言的文本文件中读取 DNA 序列并将其存储在数组中并从每个核苷酸位置开始提取给定长度的所有子串？

例如，序列在文本文件中的方式如下

cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat

所有起始位置的所有子串

如果子字符串的长度 = 3

cct, ctg, tga, gat, ..., 猫

score 0 · Accepted Answer

C语言对你来说是强制性的吗？

我会转向更高级别的语言，例如 Python，这个函数会：

from itertools import count

def iterate_fragments(sequence,size):
    """Takes a string and yields pieces of given size."""
    for number in count():
        try: yield sequence[number:number+size]
        except IndexError: break

for fragment in iterate_fragments(dna_sequence,3):
    print fragment

这个简单的代码将打印每个 dna 片段（3 个核苷酸大小）。

bioinformatics - 如何从文本文件中读取 DNA 序列并将其存储在 C 中的数组中？

1 回答 1

Related

Reference