尝试这个 :
def toTree(expression):
tree = dict()
msg =""
stack = list()
for char in expression:
if(char == '('):
stack.append(msg)
msg = ""
elif char == ')':
parent = stack.pop()
if parent not in tree:
tree[parent] = list()
tree[parent].append(msg)
msg = parent
else:
msg += char
return tree
expression = "(Root(AB(ABC)(CBA))(CD(CDE)(FGH)))"
print toTree(expression)
它返回一个字典,可以使用键 '' 访问根目录。然后你可以做一个简单的 BFS 来打印输出。
输出 :
{
'' : ['Root'],
'AB' : ['ABC', 'CBA'],
'Root': ['AB', 'CD'],
'CD' : ['CDE', 'FGH']
}
在开始之前,您必须消除表达式中的所有空格,或者通过在 for-loop 的第一行添加以下内容来忽略表达式中不相关的字符:
if char == <IRRELEVANT CHARACTER>:
continue
上面的代码将在 O(n) 时间内运行,其中 n 是表达式的长度。
编辑
这是打印功能:
def printTree(tree, node):
if node not in tree:
return
print '%s -> %s' % (node, ' '.join(child for child in tree[node]))
for child in tree[node]:
printTree(tree, child)
可以通过以下方式实现所需的输出:
expression = "(Root(AB(ABC)(CBA))(CD(CDE)(FGH)))"
tree = toTree(expression)
printTree(tree, tree[''][0])
输出
Root -> AB CD
AB -> ABC CBA
CD -> CDE FGH
编辑
假设节点名称不是唯一的,我们只需为节点指定新名称。这可以使用:
def parseExpression(expression):
nodeMap = dict()
counter = 1
node = ""
retExp =""
for char in expression:
if char == '(' or char == ')' :
if (len(node) > 0):
nodeMap[str(counter)] = node;
retExp += str(counter)
counter +=1
retExp += char
node =""
elif char == ' ': continue
else :
node += char
return retExp,nodeMap
打印功能现在将更改为:
def printTree(tree, node, nodeMap):
if node not in tree:
return
print '%s -> %s' % (nodeMap[node], ' '.join(nodeMap[child] for child in tree[node]))
for child in tree[node]:
printTree(tree, child, nodeMap)
可以使用以下命令获得输出:
expression = " ( Root( SQ ( VBZ ) ( NP ( DT ) ( NN ) ) ( VP ( VB ) ( NP ( NN ) ) ) ))"
expression, nodeMap = parseExpression(expression)
tree = toTree(expression)
printTree(tree, tree[''][0], nodeMap)
输出 :
Root -> SQ
SQ -> VBZ NP VP
NP -> DT NN
VP -> VB NP
NP -> NN