0

朋友们。

我有一个需要解析的类似“make”的样式文件。语法类似于:

samtools=/path/to/samtools
picard=/path/to/picard

task1: 
    des: description
    path: /path/to/task1
    para: [$global.samtools,
           $args.input,
           $path
          ]

task2: task1

where$global包含在全局范围内定义的变量。$path是一个“局部”变量。$args包含用户传入的键/对值。

我想通过一些 python 库来解析这个文件。最好返回一些解析树。如果有一些错误,最好报告它们。我找到了这个:CodeTalkeryeanpypa。在这种情况下可以使用它们吗?还有其他建议吗?

4

2 回答 2

6

我不得不根据你的例子猜测你的makefile结构允许什么,但这应该让你接近:

from pyparsing import *
# elements of the makefile are delimited by line, so we must
# define skippable whitespace to include just spaces and tabs
ParserElement.setDefaultWhitespaceChars(' \t')
NL = LineEnd().suppress()

EQ,COLON,LBRACK,RBRACK = map(Suppress, "=:[]")
identifier = Word(alphas+'_', alphanums)

symbol_assignment = Group(identifier("name") + EQ + empty + 
                          restOfLine("value"))("symbol_assignment")
symbol_ref = Word("$",alphanums+"_.")

def only_column_one(s,l,t):
    if col(l,s) != 1:
        raise ParseException(s,l,"not in column 1")
# task identifiers have to start in column 1
task_identifier = identifier.copy().setParseAction(only_column_one)

task_description = "des:" + empty + restOfLine("des")
task_path = "path:" + empty + restOfLine("path")
task_para_body = delimitedList(symbol_ref)
task_para = "para:" + LBRACK + task_para_body("para") + RBRACK
task_para.ignore(NL)
task_definition = Group(task_identifier("target") + COLON + 
        Optional(delimitedList(identifier))("deps") + NL +
        (
        Optional(task_description + NL) & 
        Optional(task_path + NL) & 
        Optional(task_para + NL)
        )
    )("task_definition")

makefile_parser = ZeroOrMore(
    symbol_assignment |
    task_definition |
    NL
    )


if __name__ == "__main__":
    test = """\
samtools=/path/to/samtools
picard=/path/to/picard

task1:  
    des: description 
    path: /path/to/task1 
    para: [$global.samtools, 
           $args.input, 
           $path 
          ] 

task2: task1 
"""

# dump out what we parsed, including results names
for element in makefile_parser.parseString(test):
    print element.getName()
    print element.dump()
    print

印刷:

symbol_assignment
['samtools', '/path/to/samtools']
- name: samtools
- value: /path/to/samtools

symbol_assignment
['picard', '/path/to/picard']
- name: picard
- value: /path/to/picard

task_definition
['task1', 'des:', 'description ', 'path:', '/path/to/task1 ', 'para:', 
 '$global.samtools', '$args.input', '$path']
- des: description 
- para: ['$global.samtools', '$args.input', '$path']
- path: /path/to/task1 
- target: task1

task_definition
['task2', 'task1']
- deps: ['task1']
- target: task2

dump() 输出显示您可以使用哪些名称来获取已解析元素中的字段,或区分您拥有的元素类型。dump() 是一个方便的通用工具,用于输出 pyparsing 已解析的任何内容。下面是一些更特定于您的特定解析器的代码,展示了如何将字段名称用作点对象引用(element.targetelement.depselement.name等)或 dict 样式引用(element[key]):

for element in makefile_parser.parseString(test):
    if element.getName() == 'task_definition':
        print "TASK:", element.target,
        if element.deps:
            print "DEPS:(" + ','.join(element.deps) + ")"
        else:
            print
        for key in ('des', 'path', 'para'):
            if key in element:
                print " ", key.upper()+":", element[key]

    elif element.getName() == 'symbol_assignment':
        print "SYM:", element.name, "->", element.value

印刷:

SYM: samtools -> /path/to/samtools
SYM: picard -> /path/to/picard
TASK: task1
  DES: description 
  PATH: /path/to/task1 
  PARA: ['$global.samtools', '$args.input', '$path']
TASK: task2 DEPS:(task1)
于 2012-06-13T08:45:41.390 回答
3

我过去使用pyparsing过并且对它非常满意(qv,pyparsing 项目站点)。

于 2012-06-13T05:17:56.710 回答