1

我想,给定一个 JSGF 语法,生成它将映射到的所有终端字符串。例如,给定A (B | C) D [E],我想要的输出是:

A B D E
A C D E
A B D
A C D

我决定从最简单的项目开始,可选的支架,但很快就遇到了砖墙。它适用于 1 个项目,但不适用于具有替代品的项目。任何意见,将不胜感激。

我现在拥有的:

import re
rule = raw_input('Enter the rule you want to test: ')
items = re.findall(r"\w[\w ]*\w|\w|\[|\]|\(|\)", rule)
for anitem in range(len(items)):
    bracketc = items[:anitem].count('[') - items[:anitem].count(']')
    if items[anitem] != '[' and items[anitem] != ']':
        if bracketc > 0:
            optional = True
        else:
            optional = False
        while optional == True:
            print ' '.join(items)
            it2 = items[:]
            it2.remove(it2[anitem])
            print ' '.join(it2)
            break

它适用于 1 个项目,并给定字符串 AB [C] D,返回:

A B [ C ] D
A B [ ] D

但随着复杂性的增加而崩溃,所以我猜我需要一些完全不同的东西。

4

2 回答 2

1

从您的示例中,我编写了以下代码:

rule="A(B|C)D[E]FG"

def generate_strings(rule):
    if not rule:
        return [""]
    begin,end=rule[0],rule[1:]
    if begin=='[':
        i=end.find(']')
        if i==-1:
            raise Exception("Unmatched '['")
        alt,end=end[0:i],end[i+1:]
        return [a+e for e in generate_strings(end) for a in [alt,""]]
    if begin=='(':
        i=end.find(')')
        if i==-1:
            raise Exception("Unmatched '('")
        alt,end=end[0:i].split('|'),end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin in [']',')','|']:
        raise Exception("Unexpected " + begin)
    return [begin + e for e in generate_strings(end)]

print generate_strings(rule)

编辑: 这是使事情与嵌套表达式一起工作的尝试。它并不是一直都有效,因为现在解析更加精细:当我们找到一个右括号时,它可能不是我们想要的那个,而是嵌套表达式的那个。管道和括号也是如此。

def flatten(l):
    return [item for sublist in l for item in sublist]

def generate_strings(rule):
    if not rule:
        return [""]
    begin,end=rule[0],rule[1:]
    if begin=='[':
        i=end.find(']')
        if i==-1:
            raise Exception("Unmatched '['")
        alt=flatten([generate_strings(a) for a in [end[0:i],""]])
        end=end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin=='(':
        i=end.find(')')
        if i==-1:
            raise Exception("Unmatched '('")
        alt=flatten([generate_strings(a) for a in end[0:i].split('|')])
        end=end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin in [']',')','|']:
        raise Exception("Unexpected " + begin)
    return [begin + e for e in generate_strings(end)]

print generate_strings(rule)
于 2013-06-18T18:13:50.803 回答
0

Josay 答案的生成器形式:

def generate_strings(rule):
    if not rule:
        yield ""
    else:
        begin, end = rule[0], rule[1:]
        if begin == '[':
            i = end.find(']')
            if i == -1:
                raise ValueError("Unmatched '['")
            optional, end = end[:i], end[i+1:]
            for e in generate_strings(end):
                yield e
                yield optional + e
        elif begin == '(':
            i = end.find(')')
            if i == -1:
                raise ValueError("Unmatched '('")
            parts, end = end[:i].split('|'), end[i+1:]

            for e in generate_strings(end):
                for p in parts:
                    yield p + e
        elif begin in '])|':
            raise ValueError("Unexpected " + begin)
        else:
            for e in generate_strings(end):
                yield begin + e
>>> list(generate_strings("A(B|C)D[E]FG"))
['ABDFG', 'ACDFG', 'ABDEFG', 'ACDEFG']
于 2013-06-18T18:49:43.603 回答