parsing - 从先前解析的值中解析特定数量的行

Question

我正在使用 pyparsing 来获取gEDA原理图/符号文件格式。大多数都是直截了当的，但我不确定如何匹配由初始行上的整数字段指定的后续行数。

文本对象的格式如下：

(other objects)
T x y color size vis snv angle align num_lines
Text line one
Line two of the text
Finally, the 'num_lines'th line
(other objects)

num_lines是一个整数。这种风格也用于其他一些类型。

作为一种变通方法，我将此类行定义为与有效对象类型不匹配的任何内容。从技术上讲，文本对象中允许使用这种类似对象的线条

text_meta = Type("T") + coord + color + size + visibility + show_name_value \   
            + angle + alignment + num_lines + EOL                                   
text_data_line = ~obj + LineStart() + SkipTo(LineEnd()) + EOL                   
text_data = Group(OneOrMore(text_data_line)).setResultsName('text')             
text_data = text_data.setParseAction(lambda t: '\n'.join(t[0]))                 
text = text_meta + text_data

动态生成匹配规则，例如：

def genLineMatcher(n):
    return (LineStart() + Skipto(LineEnd()) + EOL)*n

是在桌子上，但我不知道如何指定规则。

score 0 · Accepted Answer

即时生成匹配规则...

你实际上是在正确的轨道上。动态创建规则的方式是将可变长度表达式定义为 Forward()，然后在解析计数字段时在解析操作中插入实际规则。

幸运的是，pyparsing 已经在辅助方法中实现了这一点countedArray。如果您将表达式更改为：

text_meta = (Type("T") + coord + color + size + visibility + show_name_value +
               angle + alignment + countedArray(EOL + restOfLine)("lines"))

我认为这会做你想要的。然后，您可以使用“lines”结果名称检索行数组。

score 0 · Accepted Answer

pyparsing 辅助函数 'countedArray(expr)' 几乎是需要的。解析器定义和修改的辅助函数：

def numLinesList(expr, name=None):                                                                                                                                                                        
    """Helper to snarf an end-of-line integer and match 'expr' N times after.                                                                                                                        
    Almost exactly like pyparsing.countedArray.                                                                                                                                                      
    Matches patterns of the form::                                                                                                                                                                   
        ... num_lines                                                                                                                                                                                
        line one                                                                                                                                                                                     
        line two                                                                                                                                                                                     
        num_lines'th line                                                                                                                                                                            
    """                                                                                                                                                                                              
    arrayExpr = Forward()                                                                                                                                                                            
    def numLinesAction(s, l, t):                                                                                                                                                                     
        n = int(t[0])                                                                                                                                                                                
        arrayExpr << (n and Group(And([expr]*(n+1))) or Group(empty))                                                                                                                                
        return []                                                                                                                                                                                    
    matcher = Word(nums).setParseAction(numLinesAction, callDuringTry=True) \                                                                                                                        
              + arrayExpr                                                                                                                                                                            
    # remove first empty string                                                                                                                                                                      
    matcher.addParseAction(lambda t: [t[0][1:]])                                                                                                                                                     
    if name:
        matcher = matcher.setResultsName(name)                                                                                                                                                           
    return matcher

text_meta = Type("T") + coord + color + size + visibility + show_name_value \   
        + angle + alignment
text_data_line = SkipTo(LineEnd()) + EOL
text_data = numLinesList(text_data_line, 'text')
text = text_meta + text_data

对于以下输入片段：

...
T 41600 47800 9 10 1 0 0 0 2                                                                                                                                                                         
This is line 1                                                                                                                                                                                       
line 2 is here...                                                                                                                                                                                    
T 41600 47000 9 10 1 0 0 0 2                                                                                                                                                                         
Another first line                                                                                                                                                                                   
second line foo

这输出：

['T', 41600, 47800, '9', 10, True, '0', 0, 0, ['This is line 1', 'line 2 is here...']]
['T', 41600, 47000, '9', 10, True, '0', 0, 0, ['Another first line', 'second line foo']]

parsing - 从先前解析的值中解析特定数量的行

2 回答 2

Related

Reference