1

所以老实说,我只是被难住了,目标是在包装上拆分,但如果它在被包装的东西中,则不是同一个包装。

取以下字符串:

s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'

结果列表应该是['something','{','now I am wrapped {I should not cause splitting} I am still wrapped','}','something else']

我尝试过的最简单的事情是findall看看它是如何工作的,但由于正则表达式没有记忆,它不考虑换行,所以它一旦找到另一个结束括号就结束。这是发生的事情:

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> re.findall(r'{.*?}',s)
['{now I am wrapped {I should not cause splitting}']

关于如何让它识别不识别它是否是内部包装的一部分的任何想法?

4

4 回答 4

1
s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
m = re.search(r'(.*)({)(.*?{.*?}.*?)(})(.*)', s)
print m.groups()

新答案:

s = 'something{now I am wrapped {I should {not cause} splitting} I am still wrapped}something else'
m = re.search(r'([^{]*)({)(.*)(})([^}]*)', s)
print m.groups()
于 2013-09-13T16:32:25.637 回答
0

不确定这是否总是能满足您的需求,但您可以使用partitionand rpartition,例如:

In [26]: s_1 = s.partition('{')
In [27]: s_1
Out[27]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped}something else')
In [30]: s_2 = s_1[-1].rpartition('}')
In [31]: s_2
Out[31]: 
('now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')
In [34]: s_out = s_1[0:-1] + s_2
In [35]: s_out
Out[35]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')
于 2013-09-13T16:25:24.650 回答
0

基于所有的响应,我决定只编写一个函数,该函数接受字符串和包装器,并使用蛮力迭代输出列表:

def f(string,wrap1,wrap2):
    wrapped = False
    inner = 0
    count = 0
    holds = ['']
    for i,c in enumerate(string):
        if c == wrap1 and not wrapped:
            count += 2
            wrapped = True
            holds.append(wrap1)
            holds.append('')
        elif c == wrap1 and wrapped:
            inner += 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner > 0:
            inner -= 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner == 0:
            wrapped = False
            count += 2
            holds.append(wrap2)
            holds.append('')
        else:
            holds[count] += c
    return holds

现在这表明它正在工作:

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> f(s,'{','}')
['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'something else']
于 2013-09-13T17:28:32.940 回答
0

Scanner您可以使用模块的解决此问题re

使用以下字符串列表作为测试:

l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']

创建一个类,我将在其中保持打开和关闭花括号数量的状态,以及它们两个边缘之间的文本。它有三种方法,一种是匹配左花括号,另一种是右花括号,最后一种是两者之间的文本。取决于堆栈(opened_cb变量)是否为空,我会执行不同的操作:

class Cb():

    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0

    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None

    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        self.results.append(token)
        return None

    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None

最后,我Scanner在一个简单的列表中创建并加入结果:

for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}\nResult --> {1}\n\n'.format(s, results))

这是完整的程序和执行以查看结果:

import re

l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']


class Cb():

    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0

    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None

    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        return None

    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None

for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}\nResult --> {1}\n\n'.format(s, results))

像这样运行它:

python3 script.py

这会产生:

Original string --> something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else
Result --> ['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'everything else']


Original string --> something{now I am wrapped} here {and there} listen
Result --> ['something', '{', 'now I am wrapped', '}', ' here ', '{', 'and there', '}', ' listen']


Original string --> something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now I am wrapped {I should {not} cause splitting} I am still wrapped', '}', 'everything']


Original string --> something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped', '}', 'everything']
于 2013-09-15T11:06:03.630 回答