python - 删除 () 和 [] 之间的文本

Question

我有一个很长的文本字符串，其中包含()和[]。我试图删除括号和方括号之间的字符，但我不知道如何。

该列表与此类似：

x = "This is a sentence. (once a day) [twice a day]"

这个列表不是我正在使用的，但非常相似并且要短得多。

score 121 · Accepted Answer

您可以使用 re.sub 功能。

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'

如果要删除 [] 和 ()，可以使用以下代码：

>>> import re 
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence.  '

重要提示：此代码不适用于嵌套符号

解释

第一个正则表达式分组(或[进入第 1 组（用括号括起来）和)或]进入第 2 组，匹配这些组和它们之间的所有字符。匹配后，匹配的部分被第 1 组和第 2 组替换，最后的字符串在括号内没有任何内容。第二个正则表达式是自解释的 -> 匹配所有内容并替换为空字符串。

--根据Ajay Thomas的评论修改 

score 22 · Accepted Answer

运行此脚本，它甚至可以使用嵌套括号。
使用基本的逻辑测试。

def a(test_str):
    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '[':
            skip1c += 1
        elif i == '(':
            skip2c += 1
        elif i == ']' and skip1c > 0:
            skip1c -= 1
        elif i == ')'and skip2c > 0:
            skip2c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)

以防万一你不运行它，
这是输出：

>>> 
ewq This is a sentence.  
'ewq This is a sentence.  '

score 16 · Accepted Answer

这是一个类似于@pradyunsg 的答案的解决方案（它适用于任意嵌套括号）：

def remove_text_inside_brackets(text, brackets="()[]"):
    count = [0] * (len(brackets) // 2) # count open/close brackets
    saved_chars = []
    for character in text:
        for i, b in enumerate(brackets):
            if character == b: # found bracket
                kind, is_close = divmod(i, 2)
                count[kind] += (-1)**is_close # `+1`: open, `-1`: close
                if count[kind] < 0: # unbalanced bracket
                    count[kind] = 0  # keep it
                else:  # found bracket to remove
                    break
        else: # character is not a [balanced] bracket
            if not any(count): # outside brackets
                saved_chars.append(character)
    return ''.join(saved_chars)

print(repr(remove_text_inside_brackets(
    "This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence.  '

score 14 · Accepted Answer

这应该适用于括号。正则表达式将“消耗”它匹配的文本，因此它不适用于嵌套括号。

import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)

或者这会找到一组括号，只需循环查找更多：

start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
  result = mystring[start+1:end]

score 2 · Accepted Answer

您可以再次拆分、过滤和加入字符串。如果您的括号定义明确，则应该执行以下代码。

import re
x = "".join(re.split("\(|\)|\[|\]", x)[::2])

python - 删除 () 和 [] 之间的文本

5 回答 5

解释

Related

Reference