请原谅我有两点:
我提出了不完全基于正则表达式的解决方案。我知道,我读到您需要纯正则表达式解决方案。但是我遇到了一个有趣的问题,我很快得出结论,使用正则表达式来解决这个问题过于复杂了。我无法用纯正则表达式解决方案来回答。我找到了以下的,我给他们看;也许,他们可以给你一些想法。
我不知道 C# 或 .NET,只知道 Python。由于所有语言中的正则表达式几乎相同,我以为我会只用正则表达式来回答,这就是我开始搜索这个问题的原因。现在,我在 Python 中展示我的解决方案都是一样的,因为我认为无论如何它很容易理解。
我认为很难通过唯一的正则表达式来捕获文本中所有出现的字母,因为在几行中找到几个字母在我看来似乎是在匹配中找到嵌套匹配的问题(也许我不够熟练在正则表达式中)。
所以我想最好先搜索所有行中所有出现的字母并将它们放在一个列表中,然后通过在列表中切片来选择希望出现的位置。
对于一行中的字母搜索,一个正则表达式首先对我来说似乎没问题。所以使用函数 selectRE() 的解决方案。
后来,我意识到选择一行中的字母与在方便的索引处切片一行相同,这与切片列表相同。因此函数 select()。
我把这两个解一起给出,所以可以验证两个函数的两个结果是否相等。
import re
def selectRE(a,which_chars,b,x,which_lines,y,ch):
ch = ch[:-1] if ch[1]=='\n' else ch # to obtain an exact number of lines
NL = ch.count('\n') +1 # number of lines
def pat(a,which_chars,b):
if which_chars=='to':
print repr(('.{'+str(a-1)+'}' if a else '') + '(.{'+str(b-a+1)+'}).*(?:\n|$)')
return re.compile(('.{'+str(a-1)+'}' if a else '') + '(.{'+str(b-a+1)+'}).*(?:\n|$)')
elif which_chars=='before':
print repr('.*(.{'+str(a)+'})'+('.{'+str(b)+'}' if b else '')+'(?:\n|$)')
return re.compile('.*(.{'+str(a)+'})'+('.{'+str(b)+'}' if b else '')+'(?:\n|$)')
elif which_chars=='after':
print repr(('.{'+str(b)+'}' if b else '')+'(.{'+str(a)+'}).*(?:\n|$)')
return re.compile(('.{'+str(b)+'}' if b else '')+'(.{'+str(a)+'}).*(?:\n|$)')
if which_lines=='to' : x = x-1
elif which_lines=='before': x,y = NL-x-y,NL-y
elif which_lines=='after' : x,y = y,y+x
return pat(a,which_chars,b).findall(ch)[x:y]
def select(a,which_chars,b,x,which_lines,y,ch):
ch = ch[:-1] if ch[1]=='\n' else ch # to obtain an exact number of lines
NL = ch.count('\n') +1 # number of lines
if which_chars=='to' : a = a-1
elif which_chars=='after' : a,b = b,a+b
if which_lines=='to' : x = x-1
elif which_lines=='before': x,y = NL-x-y,NL-y
elif which_lines=='after' : x,y = y,y+x
return [ line[len(line)-a-b:len(line)-b] if which_chars=='before' else line[a:b]
for i,line in enumerate(ch.splitlines()) if x<=i<y ]
ch = '''line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6
'''
print ch,'\n'
print 'Characters 3-6 of lines 3-5. A total of 3 matches.'
print selectRE(3,'to',6,3,'to',5,ch)
print select(3,'to',6,3,'to',5,ch)
print
print 'Characters 1-5 of lines 4-5. A total of 2 matches.'
print selectRE(1,'to',5,4,'to',5,ch)
print select(1,'to',5,4,'to',5,ch)
print
print '7 characters before the last 3 chars of lines 2-6. A total of 5 matches.'
print selectRE(7,'before',3,2,'to',6,ch)
print select(7,'before',3,2,'to',6,ch)
print
print '6 characters before the 2 last characters of 3 lines before the 3 last lines.'
print selectRE(6,'before',2,3,'before',3,ch)
print select(6,'before',2,3,'before',3,ch)
print
print '4 last characters of 2 lines before 1 last line. A total of 2 matches.'
print selectRE(4,'before',0,2,'before',1,ch)
print select(4,'before',0,2,'before',1,ch)
print
print 'last 1 character of 4 last lines. A total of 2 matches.'
print selectRE(1,'before',0,4,'before',0,ch)
print select(1,'before',0,4,'before',0,ch)
print
print '7 characters before the last 3 chars of 3 lines after the 2 first lines. A total of 5 matches.'
print selectRE(7,'before',3,3,'after',2,ch)
print select(7,'before',3,3,'after',2,ch)
print
print '5 characters before the 3 last chars of the 5 first lines'
print selectRE(5,'before',3,5,'after',0,ch)
print select(5,'before',3,5,'after',0,ch)
print
print 'Characters 3-6 of the 4 first lines'
print selectRE(3,'to',6,4,'after',0,ch)
print select(3,'to',6,4,'after',0,ch)
print
print '9 characters after the 2 first chars of the 3 lines after the 1 first line'
print selectRE(9,'after',2,3,'after',1,ch)
print select(9,'after',2,3,'after',1,ch)
结果
line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6
Characters 3-6 of lines 3-5. A total of 3 matches.
'.{2}(.{4}).*(?:\n|$)'
['ne3 ', 'ne4 ', 'ne5 ']
['ne3 ', 'ne4 ', 'ne5 ']
Characters 1-5 of lines 4-5. A total of 2 matches.
'.{0}(.{5}).*(?:\n|$)'
['line4', 'line5']
['line4', 'line5']
7 characters before the last 3 chars of lines 2-6. A total of 5 matches.
'.*(.{7}).{3}(?:\n|$)'
['ne2 bla', 'ne3 bla', 'ne4 bla', 'ne5 bla', 'ne6 bla']
['ne2 bla', 'ne3 bla', 'ne4 bla', 'ne5 bla', 'ne6 bla']
6 characters before the 2 last characters of 3 lines before the 3 last lines.
'.*(.{6}).{2}(?:\n|$)'
['2 blah', '3 blah', '4 blah']
['2 blah', '3 blah', '4 blah']
4 last characters of 2 lines before 1 last line. A total of 2 matches.
'.*(.{4})(?:\n|$)'
['ah 5', 'ah 6']
['ah 5', 'ah 6']
last 1 character of 4 last lines. A total of 2 matches.
'.*(.{1})(?:\n|$)'
['4', '5', '6']
['4', '5', '6']
7 characters before the last 3 chars of 3 lines after the 2 first lines. A total of 5 matches.
'.*(.{7}).{3}(?:\n|$)'
['ne3 bla', 'ne4 bla', 'ne5 bla']
['ne3 bla', 'ne4 bla', 'ne5 bla']
5 characters before the 3 last chars of the 5 first lines
'.*(.{5}).{3}(?:\n|$)'
['1 bla', '2 bla', '3 bla', '4 bla', '5 bla']
['1 bla', '2 bla', '3 bla', '4 bla', '5 bla']
Characters 3-6 of the 4 first lines
'.{2}(.{4}).*(?:\n|$)'
['ne1 ', 'ne2 ', 'ne3 ', 'ne4 ']
['ne1 ', 'ne2 ', 'ne3 ', 'ne4 ']
9 characters after the 2 first chars of the 3 lines after the 1 first line
'.{2}(.{9}).*(?:\n|$)'
['ne2 blah ', 'ne3 blah ', 'ne4 blah ']
['ne2 blah ', 'ne3 blah ', 'ne4 blah ']
现在我将研究 Tim Pietzcker 的棘手解决方案