0

我是一个python/编程新手,已经做了几个月了。希望这段代码对于 SO 来说不是太大或过分,但我不知道在没有完整上下文的情况下如何问这个问题。所以这里是:

import re
import itertools

nouns = ['bacon', 'cheese', 'eggs', 'milk', 'fish', 'houses', 'dog']
CC = ['and', 'or']

def replacer_factory():
    def create_permutations(match):
        group1_string = (match.group(1)[:-1]) # strips trailing whitespace
        # creates list of matched.group() with word 'and' or 'or' removed
        nouns2 = filter(None, re.split(r',\s*', group1_string)) + [match.group(3)] 
        perm_nouns2 = list(itertools.permutations(nouns2))
        CC_match = match.group(2) # this either matches word 'and' or 'or'

        # create list that holds the permutations created in for loop below
        perm_list = []
        for comb in itertools.permutations(nouns2):
            comb_len = len(comb)
            if comb_len == 2:
                perm_list.append(' '.join((comb[0], CC_match, comb[-1])))

            elif comb_len == 3:
                perm_list.append(', '.join((comb[0], comb[1], CC_match, comb[-1])))

            elif comb_len == 4:
                perm_list.append(', '.join((comb[0], comb[1], comb[2], CC_match, comb[-1])))

        # does the match.group contain word 'and' or 'or'
        if (match.group(2)) == "and":
            joined = '*'.join(perm_list)
            strip_comma = joined.replace("and,", "and")
            completed = '|'+strip_comma+'|'
            return completed

        elif (match.group(2)) == "or":
            joined = '*'.join(perm_list)
            strip_comma = joined.replace("or,", "or")
            completed = '|'+strip_comma+'|'
            return completed       

    return create_permutations

def search_and_replace(text):
    # use'nouns' and 'CC' lists to find a noun list phrase
    # e.g 'bacon, eggs, and milk' is 1 example of a match
    noun_patt = r'\b(?:' + '|'.join(nouns) + r')\b'
    CC_patt = r'\b(' + '|'.join(CC) + r')\b' 
    patt = r'((?:{0},? )+){1} ({0})'.format(noun_patt, CC_patt)

    replacer = replacer_factory()
    return re.sub(patt, replacer, text)

def main():
    with open('test_sentence.txt') as input_f:
        read_f = input_f.read()

    with open('output.txt', 'w') as output_f:
        output_f.write(search_and_replace(read_f))


if __name__ == '__main__':
    main()

'test_sentence.txt' 的内容:

I am 2 list with 'or': eggs or cheese.
I am 2 list with 'and': milk and eggs.
I am 3 list with 'or': cheese, bacon, and eggs.
I am 3 list with 'and': bacon, milk and cheese.
I am 4 list: milk, bacon, eggs, and cheese.
I am 5 list, I don't match.
I am 3 list with non match noun: cheese, bacon and pie.

所以,代码都很好用,但我遇到了一个我不知道如何解决的限制。这个限制包含在 for 循环中。就目前而言,我只创建了 'if' 和 'elif' 语句,它们仅能达到elif comb == 4:. 我实际上希望它成为无限的,继续前进到elif comb == 5:, elif comb == 6:, elif comb == 7:。(好吧,在实际现实中,我真的不需要超越elif comb == 20,但重点是一样的,我想考虑这种可能性)。但是,创建这么多“elif”语句是不切实际的。

关于如何解决这个问题的任何想法?

请注意,此处的“test_sentence.txt”和变量“名词”列表只是示例。我实际上的“名词”列表有 1000 个,我将处理文本包含在“test_sentence.txt”中的更大文档。

干杯达伦

PS - 我努力想出一个合适的标题!

4

1 回答 1

3

如果您注意到,if-elif语句中的每一行都遵循大致相同的结构:您首先获取comb列表中除最后一个之外的每个元素,添加 on CC_match,然后添加最后一项。

如果我们把它写成代码,我们会得到这样的东西:

head = list(comb[0:-1])
head.append(CC_match)
head.append(comb[-1])
perm_list.append(', '.join(head))

然后,在for循环中,您可以替换if-elif语句:

for comb in itertools.permutations(nouns2):
    head = list(comb[0:-1])
    head.append(CC_match)
    head.append(comb[-1])
    perm_list.append(', '.join(head))

comb您还应该考虑添加一些错误检查,以便在列表的长度等于 0 或 1时程序不会做出奇怪的反应。

于 2013-09-27T06:37:33.757 回答