我创建了一个读取 ID 对列表的函数(即 [("A","B"),("B","C"),("C","D"),...] 和序列ID 从头到尾包括任何分支。
每个有序 ID 的列表都保存在一个名为 Alignment 的类中,该函数使用递归来处理分支,方法是创建一个新的对齐,从分支从主列表拆分的 ID 开始。
我发现使用某些输入可以达到 Python 设置的最大递归限制。我知道我可以使用 sys.setrecursionlimit() 来增加这个限制,但是由于我不知道有多少分支组合是可能的,所以我想避免这种策略。
我已经阅读了几篇关于将递归函数转换为迭代函数的文章,但我无法确定处理这个特定函数的最佳方法,因为递归发生在函数的中间并且可能是指数的。
你们中的任何人都可以提供任何建议吗?
谢谢,布赖恩
代码贴在下面:
def buildAlignments(alignment, alignmentList, endIDs):
while alignment.start in endIDs:
#If endID only has one preceding ID: add preceding ID to alignment
if len(endIDs[alignment.start]) == 1:
alignment.add(endIDs[alignment.start][0])
else:
#List to hold all branches that end at spanEnd
branches = []
for each in endIDs[alignment.start]:
#New alignment for each branch
al = Alignment(each)
#Recursively process each new alignment
buildAlignments(al, branches, endIDs)
branches.append(al)
count = len(branches)
i = 0
index = 0
#Loop through branches by length
for branch in branches:
if i < count - 1:
#Create copy of original alignment and add branch to alignment
al = Alignment(alignment)
al += branch #branches[index]
alignmentList.append(al)
i += 1
#Add single branch to existing original alignment
else: alignment += branch #branches[index]
index += 1
def main():
IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]
#Gather all startIDs with corresponding endIDs and vice versa
startIDs = {}
endIDs = {}
for pair in IDs:
if not pair[0] in startIDs: startIDs[pair[0]] = []
startIDs[pair[0]].append(pair[1])
if not pair[1] in endIDs: endIDs[pair[1]] = []
endIDs[pair[1]].append(pair[0])
#Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
alignments = [Alignment(end) for end in endIDs if not end in startIDs]
#Build build sequences in each original Alignment
i = len(alignments)
while i:
buildAlignments(alignments[i-1], alignments, endIDs)
i -= 1
编辑:我应该指出,提供的 ID 只是我用于测试此算法的一个小样本。实际上,ID 的序列可能长达数千,其中包含许多分支和分支的分支。
解决方案:感谢 Andrew Cooke。新方法在调用堆栈上似乎更简单、更容易。我确实对他的代码做了一些小的调整,以更好地适应我的目的。我在下面包含了完整的解决方案:
from collections import defaultdict
def expand(line, have_successors, known):
#print line
known.append(line)
for child in have_successors[line[-1]]:
newline = line + [child]
if line in known: known.remove(line)
yield expand(newline, have_successors, known)
def trampoline(generator):
stack = [generator]
while stack:
try:
generator = stack.pop()
child = next(generator)
stack.append(generator)
stack.append(child)
except StopIteration:
pass
def main(pairs):
have_successors = defaultdict(lambda: set())
links = set()
for (start, end) in pairs:
links.add(end)
have_successors[start].add(end)
known = []
for node in set(have_successors.keys()):
if node not in links:
trampoline(expand([node], have_successors, known))
for line in known:
print line
if __name__ == '__main__':
main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])
更改摘要:交换链接和 have_successors 以创建从开始到结束的列表添加if line in known: known.remove(line)
以扩展,以便仅保留从字符串到列表的完整系列更改行变量,以便处理单个 ID 中的多个字符。
更新:所以我刚刚发现我首先遇到所有这些问题的原因是对我提供的 ID 列表中的循环引用进行了处理。既然循环引用是固定的,任何一种方法都可以按预期工作。- 再次感谢你的帮助。