0

递归函数是parseMML。我希望它将 MathML 表达式解析为 Python 表达式。简单示例 mmlinput 是 por 产生分数 3/5,但它产生:

['(', '(', '3', ')', '/', '(', '5', ')', '(', '3', ')', '(', '5', ')', ')']

代替:

['(', '(', '3', ')', '/', '(', '5', ')', ')']

因为我不知道如何摆脱已经递归输入的元素。关于如何跳过它们的任何想法?

谢谢

mmlinput='''<?xml version="1.0"?> <math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/1998/Math/MathML http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd"> <mrow> <mfrac> <mrow> <mn>3</mn> </mrow> <mrow> <mn>5</mn> </mrow> </mfrac> </mrow> </math>'''


def parseMML(mmlinput):
    from lxml import etree
    from StringIO import *
    from lxml import objectify
    exppy=[]
    events = ("start", "end")
    context = etree.iterparse(StringIO(mmlinput),events=events)
    for action, elem in context:
        if (action=='start') and (elem.tag=='mrow'):
            exppy+='('
        if (action=='end') and (elem.tag=='mrow'):
            exppy+=')'
        if (action=='start') and (elem.tag=='mfrac'):
            mmlaux=etree.tostring(elem[0])
            exppy+=parseMML(mmlaux)
            exppy+='/'
            mmlaux=etree.tostring(elem[1])
            exppy+=parseMML(mmlaux)
        if action=='start' and elem.tag=='mn': #this is a number
            exppy+=elem.text
    return (exppy)
4

1 回答 1

0

问题是您要解析mfrac标记内的子树两次,因为您正在递归地解析它。一个快速的解决方法是计算递归级别:

mmlinput = "<math> <mrow> <mfrac> <mrow> <mn>3</mn> </mrow> <mrow> <mn>5</mn> </mrow> </mfrac> </mrow> </math>"

def parseMML(mmlinput):
    from lxml import etree
    from StringIO import *
    from lxml import objectify
    exppy=[]
    events = ("start", "end")
    level = 0
    context = etree.iterparse(StringIO(mmlinput),events=events)
    for action, elem in context:
        if (action=='start') and (elem.tag=='mfrac'):
            level += 1
            mmlaux=etree.tostring(elem[0])
            exppy+=parseMML(mmlaux)
            exppy+='/'
            mmlaux=etree.tostring(elem[1])
            exppy+=parseMML(mmlaux)
        if (action=='end') and (elem.tag=='mfrac'):
            level -= 1
        if level:
            continue
        if (action=='start') and (elem.tag=='mrow'):
            exppy+='('
        if (action=='end') and (elem.tag=='mrow'):
            exppy+=')'
        if action=='start' and elem.tag=='mn': #this is a number
            exppy+=elem.text
    return (exppy)

注意:我必须删除命名空间才能使其正常工作,因为elem.tag它会为我返回完全限定的标签名称。您还+=用于将字符串添加到列表中。对于可能工作的单个字符串,但+在列表上的工作方式类似于调用extend,因此:

>>> lst = []
>>> lst += 'spam'
>>> lst
['s', 'p', 'a', 'm']
于 2013-07-03T00:52:59.947 回答