1

我正在尝试使用 pyparsing 来解析可能嵌套的化学式,并且使用 pyparsing 具有非整数化学计量。我想要的是公式中存在的每个元素的列表及其相应的总化学计量。

我已经使用 pyparsing wiki 上的示例作为开始,并查看了fourFn.py 以获得更多想法。我无法理解如何使用包中的所有功能。

我想出了以下语法:

from pyparsing import Word, Group, ZeroOrMore, Combine,\
     Optional, OneOrMore, ParseException, Literal, nums,\
     Suppress, Dict, Forward

caps = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
lowers = caps.lower()
digits = "0123456789"
integer = Word( digits )
parl = Literal("(").suppress()
parr = Literal(")").suppress()

element = Word( caps, lowers )
separator = Literal( "," ).setParseAction(lambda s,l,t: t[0].replace(',','.')) | Literal( "." )

nreal = (Combine( integer + Optional( separator +\
    Optional( integer ) ))\
    | Combine( separator + integer )).setParseAction( lambda s,l,t: [ float(t[0]) ] )

block = Forward()
groupElem = Group( element + Optional( nreal, default=1)) ^ \
     Group( parl + block + parr + Optional( nreal,default=1 ) )
block << groupElem + ZeroOrMore( groupElem )
formula = OneOrMore( block )

非嵌套公式按预期工作:

>>> formula.parseString('H2O')
([(['H', 2.0], {}), (['O', 1], {})], {})

尽管有那些空字段(我找不到用途),但我可以提取我想要的信息。

但是当我尝试类似的事情时:

>>> formula.parseString('C6H8(OH)4')
([(['C', 6.0], {}), (['H', 8.0], {}), ([(['O', 1], {}), (['H', 1], {}), 4.0], {})], {})

我可以看到公式已正确解析,但我希望 (OH)4 中的外部“4”乘以内部数字。但我看不出该怎么做。

一种代币如何改变另一种代币的价值?

或者我如何遍历这些结果并创建一个函数,如果一个块附加了一个外部数字,我可以计算块内每个元素的总数?

提前致谢。

编辑1:我相信我需要类似的东西:在出现“(块)nreal”时抑制外部nreal,并将所有出现的nreal乘以外部值......

4

2 回答 2

3

肯定需要递归来解决这个问题。在 pyparsing 中,您使用类定义递归语法Forward。请参阅此代码示例中的注释:

from pyparsing import (Suppress, Word, nums, alphas, Regex, Forward, Group, 
                        Optional, OneOrMore, ParseResults)
from collections import defaultdict

"""
BNF for simple chemical formula (no nesting)

    integer :: '0'..'9'+
    element :: 'A'..'Z' 'a'..'z'*
    term :: element [integer]
    formula :: term+


BNF for nested chemical formula

    integer :: '0'..'9'+
    element :: 'A'..'Z' 'a'..'z'*
    term :: (element | '(' formula ')') [integer]
    formula :: term+

"""

LPAR,RPAR = map(Suppress,"()")
integer = Word(nums)

# add parse action to convert integers to ints, to support doing addition 
# and multiplication at parse time
integer.setParseAction(lambda t:int(t[0]))

element = Word(alphas.upper(), alphas.lower())
# or if you want to be more specific, use this Regex
# element = Regex(r"A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|E[rsu]|F[emr]?|"
#                 "G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|M[dgnot]|N[abdeiop]?|"
#                 "Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|"
#                 "Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]")

# forward declare 'formula' so it can be used in definition of 'term'
formula = Forward()

term = Group((element | Group(LPAR + formula + RPAR)("subgroup")) + 
                Optional(integer, default=1)("mult"))

# define contents of a formula as one or more terms
formula << OneOrMore(term)


# add parse actions for parse-time processing

# parse action to multiply out subgroups
def multiplyContents(tokens):
    t = tokens[0]
    # if these tokens contain a subgroup, then use multiplier to
    # extend counts of all elements in the subgroup
    if t.subgroup:
        mult = t.mult
        for term in t.subgroup:
            term[1] *= mult
        return t.subgroup
term.setParseAction(multiplyContents)

# add parse action to sum up multiple references to the same element
def sumByElement(tokens):
    elementsList = [t[0] for t in tokens]

    # construct set to see if there are duplicates
    duplicates = len(elementsList) > len(set(elementsList))

    # if there are duplicate element names, sum up by element and
    # return a new nested ParseResults
    if duplicates:
        ctr = defaultdict(int)
        for t in tokens:
            ctr[t[0]] += t[1]
        return ParseResults([ParseResults([k,v]) for k,v in ctr.items()])
formula.setParseAction(sumByElement)


# run some tests
tests = """\
    H
    NaCl
    HO
    H2O
    HOH
    (H2O)2
    (H2O)2OH
    ((H2O)2OH)12
    C6H5OH
    """.splitlines()
for t in tests:
    if t.strip():
        results = formula.parseString(t)
        print t, '->', dict(results.asList())

打印出来:

H -> {'H': 1}
NaCl -> {'Na': 1, 'Cl': 1}
HO -> {'H': 1, 'O': 1}
H2O -> {'H': 2, 'O': 1}
HOH -> {'H': 2, 'O': 1}
(H2O)2 -> {'H': 4, 'O': 2}
(H2O)2OH -> {'H': 5, 'O': 3}
((H2O)2OH)12 -> {'H': 60, 'O': 36}
C6H5OH -> {'H': 6, 'C': 6, 'O': 1}
于 2013-09-01T04:11:22.143 回答
1

我想我自己已经找到了解决方案。我必须创建一个递归函数来分析结果并根据需要输出列表,每个元素及其化学计量没有嵌套。我不得不稍微修改我的起始代码,并为我的目的使用命名结果:

from pyparsing import Word, Group, ZeroOrMore, Combine,\
     Optional, OneOrMore, ParseException, Literal, nums,\
     Suppress, Dict, Forward

caps = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
lowers = caps.lower()
digits = "0123456789"
integer = Word( digits )
parl = Literal("(").suppress()
parr = Literal(")").suppress()

element = Word( caps, lowers )
separator = Literal( "," ).setParseAction(lambda s,l,t: t[0].replace(',','.')) | Literal( "." )

nreal = (Combine( integer + Optional( separator +\
    Optional( integer ) ))\
    | Combine( separator + integer )).setParseAction( lambda s,l,t: [ float(t[0]) ] )



block = Forward()
groupElem = (Group( element('elem') + Optional( nreal, default=1)('esteq') ))('dupla') | \
     Group( parl + block + parr + Optional( nreal,default=1 )('modi'))
block << groupElem + ZeroOrMore( groupElem )
formula = OneOrMore( block )

这是我的功能。我希望它可以帮助有类似问题的人。我认为这个解决方案非常丑陋......如果有人有更好,更优雅的解决方案,我会全力以赴!

def solu(formula):
    final = []

    def diver(entr,mult=1):
        resul = list()
        # If modi is empty, it is an enclosed group
        # And we must multiply everything inside by modi
        if entr.modi != '':
            for y in entr:
                try:
                    resul.append(diver(y,entr.modi))
                except AttributeError:
                    pass
        # Else, it is just an atom, and we return it
        else:
            resul.append(entr.elem)
            resul.append(entr.esteq*mult)
        return resul

    def doubles(entr):
        resul = []
        # If entr does not contain lists
        # It is an atom
        if sum([1 for y in entr if isinstance(y,list)]) == 0:
            final.append(entr)
            return entr
        else:
            # And if it isn't an atom? We dive further
            # and call doubles until it is an atom
            for y in entr:
                doubles(y)


    for member in formula:
        # If member is already an atom, add it directly to final
        if sum([1 for x in diver(member) if isinstance(x,list)]) == 0:
            final.append(diver(member))
        else:
            # If not, call doubles on the clean member (without modi)
            # and it takes care of adding atoms to final
            doubles(diver(member))


    return final

最后,solu 成功了:

>>> solu(formula.parseString('C6H8(OH)4'))
[['C', 6.0], ['H', 8.0], ['O', 4.0], ['H', 4.0]]
于 2013-08-31T05:57:35.573 回答