0

我有一个这样的字符串:

This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip
This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip

在这个我想用“my_doc”替换数字或数字(有时数字也是十六进制)我试过:

 match = re.findall("[\.0-9]*",text)
print match

但它仅适用于数字或数字,它也应该适用于十六进制数字并将数字替换为“my_doc”并打印整行输出:

This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip
This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip
4

3 回答 3

1

你可以尝试这样的事情:

In [8]: import re


In [14]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

In [15]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs)

Out[15]: ['3000', '000']

In [16]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

In [17]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs1)

Out[17]: ['30ab00e', '000']

用于re.sub()替换:

In [68]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

In [69]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs)

Out[69]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'

In [70]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

In [71]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs1)
Out[71]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'

In [72]: foo=" number of pages completed, 2 still pending" 

In [73]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",foo)
Out[73]: ' number of pages completed, my_doc still pending'
于 2012-12-07T05:10:57.820 回答
0

这太疯狂了(所以你的问题)和hackish!

十六进制字符 (az, AZ) 出现在字符串中的许多地方,因此那些将被替换(认为问题不反对 atm ;))似乎不是预期的行为。

假设要删除的 blob/部分是十六进制字,并假设它的最小长度为 3,请考虑:

import re
from string import hexdigits


str_1 = "This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

str_2 = "This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

expression = '[%s]{3,}'%(string.hexdigits)  # = '[' + hexdigits + ']{3,}'
re.sub(exp, 'my_doc', str_1)

编辑:好吧,少一点疯狂的正则表达式,使用下面的表达式

expression = ':[%s]+\S'%(hexdigits)

这将仅匹配十六进制单词,因此十六进制+数字的长度不再是一个约束。

于 2012-12-07T10:02:09.813 回答
0

在你的正则表达式中考虑条件:http ://www.asiteaboutnothing.net/regex/regex-conditionals.html

于 2012-12-07T05:09:34.637 回答