pyparsing - pyparsing 用于查询化学元素数据库

Question

我想解析一个化学元素数据库的查询。

数据库存储在一个 xml 文件中。解析该文件会生成一个嵌套字典，该字典存储在从 collections.OrderedDict 继承的单例对象中。

要求一个元素会给我一个有序的字典，它对应的属性（即 ELEMENTS['C'] --> {'name':'carbon','neutron' : 0,'proton':6, ...} ）。

相反，要求一个属性会给我一个有序字典，其中包含所有元素的值（即 ELEMENTS['proton'] --> {'H' : 1, 'He' : 2} ...）。

一个典型的查询可能是：

mass > 10 or (nucleon < 20 and atomic_radius < 5)

其中每个“子查询”（即质量 > 10）将返回与其匹配的元素集。

然后，查询将在内部转换并转换为一个字符串，该字符串将被进一步评估以生成一组与其匹配的元素的索引。在这种情况下，运算符和/或不是布尔运算符，而是作用于 python 集的集成运算符。

我最近发了一个帖子来构建这样的查询。感谢我得到的有用答案，我认为我或多或少地完成了这项工作（我希望以一种好的方式！）但我仍然有一些与 pyparsing 相关的问题。

这是我的代码：

import numpy

from pyparsing import *

# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS

and_operator = oneOf(['and','&'], caseless=True) 
or_operator  = oneOf(['or' ,'|'], caseless=True) 

# ELEMENTS.properties is a property getter that returns the list of 
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)

# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that 
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())

quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_  = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))

comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))

grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))

# A test query
res = grammar.parseString('"mass     "  >  30 or (nucleon == 1)',parseAll=True)

print eval(' '.join(res._asStringList()))

我的问题如下：

1 using 'transformString' instead of 'parseString' never triggers any 
  exception even when the string to be parsed does not match the grammar. 
  However, it is exactly the functionnality I need. Is there is a way to do so ?

2 I would like to reintroduce white spaces between my tokens in order 
that my eval does not fail. The only way I found to do so it the one 
implemented above. Would you see a better way using pyparsing ?

很抱歉这篇长文，但我想更深入地介绍它的背景。顺便说一句，如果您发现这种方法不好，请随时告诉我！

非常感谢您的帮助。

埃里克

score 1 · Accepted Answer

不用担心我的担心，我找到了解决办法。我使用了 pyparsing 附带的 SimpleBool.py 示例（感谢 Paul 的提示）。

基本上，我使用了以下方法：

1 for each subquery (i.e. mass > 10), using the setParseAction method, 
I joined a function that returns the set of eleements that matched 
the subquery

2 then, I joined the following functions for each logical operator (and, 
or and not):

def not_operator(token):

    _, s = token[0]

    # ELEMENTS is the singleton described in my original post
    return set(ELEMENTS.keys()).difference(s)

def and_operator(token):

    s1, _, s2 = token[0]

    return (s1 and s2)

def or_operator(token):

    s1, _, s2 = token[0]

    return (s1 or s2)

# Thanks for Paul for the hint.
grammar = operatorPrecedence(comparison_expr,
          [(not_token, 1,opAssoc.RIGHT,not_operator),
          (and_token, 2, opAssoc.LEFT,and_operator),
          (or_token, 2, opAssoc.LEFT,or_operator)])

Please not that these operators acts upon python sets rather than 
on booleans.

这就是工作。

我希望这种方法对你们中的任何人都有帮助。

埃里克

pyparsing - pyparsing 用于查询化学元素数据库

1 回答 1

Related

Reference