我想要一个正则表达式来匹配一些包含字母和数字字符的文本。但我不希望它只匹配字母或数字。例如在python中:
s = '[mytaskid: 3fee46d2]: STARTED at processing job number 10022001'
# ^^^^^^^^ <- I want something that'll only match this part.
import re
rr = re.compile('([0-9a-z]{8})')
print 'sub=', rr.sub('########', s)
print 'findall=', rr.findall(s)
生成以下输出:
sub= [########: ########]: STARTED at ########ng job number ########
findall= ['mytaskid', '3fee46d2', 'processi', '10022001']
我希望它是:
sub= [mytaskid: ########]: STARTED at processing job number 10022001
findall= ['3fee46d2']
有任何想法吗... ??在这种情况下,它总是正好是 8 个字符,如果有一个没有其中的正则表达式会更好{8}
,即即使有更多或少于 8 个字符,它也可以匹配。
- 编辑 -
问题更多的是要了解是否有一种方法可以编写正则表达式,以便我可以组合 2 个模式(在本例中为[0-9]
和[a-z]
)并确保匹配的字符串与这两个模式匹配,但每个集合匹配的字符数是可变的。eg s 也可以是
s = 'mytaskid 3fee46d2 STARTED processing job number 10022001'
- 回答 -
感谢所有人的回答,他们都给了我我想要的东西,所以每个人都会得到 +1,第一个回答的人会得到接受的答案。虽然杰里解释得最好。:)
如果有人对性能很执着,那就没有什么可以选择的了,他们都是一样的。
s = '[mytaskid: 3fee46d2]: STARTED at processing job number 10022001'
# ^^^^^^^^ <- I want something that'll only match this part.
def testIt(regEx):
from timeit import timeit
s = '[mytaskid: 3333fe46d2]: STARTED at processing job number 10022001'
assert (re.sub('\\b(?=[a-z0-9]*[0-9])[a-z0-9]*[a-z][a-z0-9]*\\b', '########', s) ==
'[mytaskid: ########]: STARTED at processing job number 10022001'), '"%s" does not work.' % regEx
print 'sub() with \'', regEx, '\': ', timeit('rr.sub(\'########\', s)', number=500000, setup='''
import re
s = '%s'
rr = re.compile('%s')
''' % (s, regEx)
)
print 'findall() with \'', regEx, '\': ', timeit('rr.findall(s)', setup='''
import re
s = '%s'
rr = re.compile('%s')
''' % (s, regEx)
)
testIt('\\b[0-9a-z]*(?:[a-z][0-9]|[0-9][a-z])[0-9a-z]*\\b')
testIt('\\b[a-z\d]*(?:\d[a-z]|[a-z]\d)[a-z\d]*\\b')
testIt('\\b(?=[a-z0-9]*[0-9])[a-z0-9]*[a-z][a-z0-9]*\\b')
testIt('\\b(?=[0-9]*[a-z])(?=[a-z]*[0-9])[a-z0-9]+\\b')
制作:
sub() with ' \b[0-9a-z]*(?:[a-z][0-9]|[0-9][a-z])[0-9a-z]*\b ': 0.328042736387
findall() with ' \b[0-9a-z]*(?:[a-z][0-9]|[0-9][a-z])[0-9a-z]*\b ': 0.350668751542
sub() with ' \b[a-z\d]*(?:\d[a-z]|[a-z]\d)[a-z\d]*\b ': 0.314759661193
findall() with ' \b[a-z\d]*(?:\d[a-z]|[a-z]\d)[a-z\d]*\b ': 0.35618526928
sub() with ' \b(?=[a-z0-9]*[0-9])[a-z0-9]*[a-z][a-z0-9]*\b ': 0.322802906619
findall() with ' \b(?=[a-z0-9]*[0-9])[a-z0-9]*[a-z][a-z0-9]*\b ': 0.35330467656
sub() with ' \b(?=[0-9]*[a-z])(?=[a-z]*[0-9])[a-z0-9]+\b ': 0.320779061371
findall() with ' \b(?=[0-9]*[a-z])(?=[a-z]*[0-9])[a-z0-9]+\b ': 0.347522144274