skbio - 用于对齐的 TabularMSA 替换 (scikit-bio 0.4.1.dev0)

Question

我想读取 PHYLIP 比对（FASTA 格式），更新序列标签并将结果写回文件。如何编辑以下行以在 scikit-bio 0.4.1.dev0 中使用 TabularMSA（而不是之前支持的 Alignment）：

from skbio import Alignment ... msa_fa = Alignment.read(gene_msa_fa_fp, format='fasta') msa_fa_update_ids, new_to_old_ids = msa_fa.update_ids(func=id_mapper) msa_fa_update_ids.write(output_msa_phy_fp, format='phylip') ...

谢谢！

score 1 · Accepted Answer

将 FASTA 文件读入TabularMSA对象时，序列标识符存储在metadatakey 下的每个序列的字典中"id"。TabularMSA当以 PHYLIP 格式编写对象时，MSA 的index属性用于标记序列。使用reassign_indexFASTA 序列标识符作为 MSA 的索引，然后将它们映射到您要写入的序列标签，最后以 PHYLIP 格式写出：

from skbio import TabularMSA, DNA
msa = TabularMSA.read("aln.fasta", constructor=DNA)
msa.reassign_index(minter='id')
msa.reassign_index(mapping=id_mapper)
msa.write('aln.phy', format='phylip')

有多种方法可以设置索引，包括直接设置属性或使用reassign_index其中一个mapping或minter参数。

skbio - 用于对齐的 TabularMSA 替换 (scikit-bio 0.4.1.dev0)

1 回答 1

Related

Reference