我有一个名为:
FirstSequenceToSplit
它包含一个项目,即 DNA 序列 说:
'ATTTACGTA'
我可以很容易地返回这个项目的长度,所以用户知道它是 10 个字符长,然后我想做的是让用户说他们想要提取索引的字符,比如 [0:6],并且然后在新列表中生成两个项目。第一项具有用户定义索引的字符,后跟一个问号替换未提取的其他字符,第二项具有相反的字符。
所以为了说明我想要什么,如果用户说他们想要 [0:5] 你会得到一个包含以下项目的新列表:
['ATTTT?????','?????ACGTA']
这是一个更大问题的一部分,我有一组 FASTA 格式的 DNA 序列('>Sequence1/nATTTTACGTA'、'>Sequence2/nATTGCACGTA' 等),我希望用户能够根据它的 ID 和用于根据预定义输入拆分该序列并称为 Sequence2a 和 Sequence2b ('>Sequence1a/n?????ACGTA', '>Sequence1b/nATTTT?????''>Sequence2/ nATTGCACGTA' 等)。我目前已经通过打印序列的名称解决了这个问题,让用户选择一个来拼接,只提取序列(没有 ID),然后一旦我解决了上面显示的问题,我将创建一个包含新项目的新列表。
由于我是初学者(我确信现在很明显!)我将不胜感激对给出的代码的任何解释。非常感谢您提供的任何可能的帮助
到目前为止,我的代码是:
import sys
import re
#Creating format so more user friendly
class color:
PURPLE = '\033[95m'
CYAN = '\033[96m'
DARKCYAN = '\033[36m'
BLUE = '\033[94m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[94m'
UNDERLINE = '\033[4m'
END = '\033[0m'
fileName = raw_input("Give the name of the Fasta file you wish to divide up ")
# i.e TopTenFasta
#Reading in the sequences splitting them by the > symbol
in_file = open(fileName,"r")
sequences = in_file.read().split('>')[1:]
in_file.close()
#Putting all these sequences into a list
allSequences = []
for item in sequences:
allSequences.append(item)
#Letting you know how many sequences there are in total
NumberOfSequences = len(allSequences)
print color.BOLD + "The Number of Sequences in this list is: " +color.END, NumberOfSequences
#Returning the names of the IDs to allow you to decide which ones to split
SequenceIds = []
for x in allSequences:
SequenceIds.append(x[0:10])
print color.BOLD + "With the following names: " + color.END, "\n", "\n".join(SequenceIds)
#-----------------------Starting the Splice ------------------------------------
#-----------------------------------------------------------------------------
#------------------------------------------------------------------------------
#Choosing the sequence you wish to splice
FirstSequenceToSplitID = raw_input(color.BOLD + "Which sequence would you like to splice " + color.END)
#Seeing whether that item is in the list
for x in SequenceIds:
if FirstSequenceToSplitID == x:
print "valid input"
FirstSequenceToSplit = []
#making a new list (FirstSequenceToSplit) and putting into it just the sequence (no ID)
for listItem in allSequences:
if listItem[0:10]==FirstSequenceToSplitID:
FirstSequenceToSplit.append(listItem[11:])
#Printing the Length of the sequence to splice
for element in FirstSequenceToSplit:
print color.BOLD + "The Length of this sequence is" + color.END, len(element)