0

我需要获取未包含在尖括号中的文本。

我的输入如下所示:

> whatever something<X="Y" zzz="abc">this is a foo bar <this is a
> < whatever>and i ><only want this

所需的输出是:

> whatever something
this is a foo bar <this is a
> 
and i ><only want this

我尝试先检测括号内的东西,然后将其移除。但似乎我正在匹配内部<>而不是整体的属性<...>。我如何实现所需的输出?

import re
x = """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n< whatever>and i ><only want this"""
re.findall("<([^>]*)>", x.strip())
['X="Y" zzz="abc"', 'this is a\n    ', ' whatever']
4

1 回答 1

1

您应该在正则表达式模式中将括号移动到引号内(并删除您已有的括号)以获取<...>包括括号本身在内的所有文本。您还需要排除\n字符以实现您想要的输出。

import re
x =  """whatever something<X="Y" zzz="abc">this is a foo bar <this is a\n\
        < whatever>and i ><only want this"""
y = re.findall("(<[^>\n]*>)",x.strip())
z = x[:]
for i in y:
    z = z.replace(i,'\n')
print(z)
whatever something
this is a foo bar <this is a

and i ><only want this

findall括号指示您在找到匹配项时要对哪些文本进行分组。

于 2013-10-15T11:06:25.603 回答