python - Replace found items with regexp used.

Question

First I search for the numbers, and replace found with regexp.

Then take the changed string (?) and search for the spaces and replace found with regexp.

However I get wrong results.

test0 = This book id 0076 has 6782e6a
test1 = This book id 0076 has 0xef34a

I used following regular expression:

b = re.sub(r"(0x[a-fA-F0-9]+|\d+)","[0-9]*", test0)
c = re.sub(r'[(\s)*]','[^\s"]*',b)

My output:

test0
b = This book id [0-9]* has [0-9]*e[0-9]*a
c = This[^\s]*book[^\s]*id[^\s]*[0-9][^\s]*[^\s]*has[0-9][^\s]*e[0-9][^\s]*a

test1
b = This book id [0-9]* has [0-9]*
c = This[^\s]*book[^\s]*id[^\s]*[0-9][^\s]*[^\s]*has[0-9][^\s]*

Expected output:

test0
b = This book id [0-9]* has [0-9]*
c = This[^\s]*book[^\s]*id[^\s]*[0-9]*[^\s]*has[^\s]*[0-9]*

test1
b = This book id [0-9]* has [0-9]*
c = This[^\s]*book[^\s]*id[^\s]*[0-9]*[^\s]*has[^\s]*[0-9]*

score 0 · Accepted Answer

The second match option in the regex "(0x[a-fA-F0-9]+|\d+)" matches the string "6782e6a" twice, giving you the output "[0-9]*" + "e" + "[0-9]*" + "a".

I suggest you change your regex to something like "(?:0x)?[A-Z-a-Z0-9]+"

score 0 · Accepted Answer

如果您的字符串具有所有相同的格式，您还可以使用肯定的后向断言，只需选择和之后的'id'标记'has'。这样你就不必想出一个复杂的正则表达式。

这里是这样的：

>>> a = 'This book id 0076 has 6782e6a'
>>> b = re.sub(r'(?<=id\s)\w+', '[0-9]*', a)
>>> b
'This book id [0-9]* has 6782e6a'
>>> c = re.sub(r'(?<=has\s)\w+', '[0-9]*', b)
>>> c
'This book id [0-9]* has [0-9]*'

score 0 · Accepted Answer

在第二次阅读时，我意识到你错过的是\s匹配xf位0xf，实际上是任何十六进制字符串。

尽管我不确定您到底要做什么，但也许您需要在 re.match 中使用分组来避免字符串位与十六进制匹配，例如：

In [16]: re.match("(0x[0-9a-fA-F]+)(hello)", "0xfhello").groups()
Out[16]: ('0xf', 'hello')

老的

看起来第二轮数字可能是十进制 [0-9]+ 或十六进制 0x[0-9a-fA-F]+ 所以你的正则表达式应该是这样的：

([0-9]+)|(0x[0-9a-fA-F]+)

但是，如果您赶时间，也许可以通过将两者折叠成一个不精确的正则表达式来获得：

[0-9a-fA-Fx]+

score 0 · Accepted Answer

第一个正则表达式可以是(?<=\s)(0x)?[0-9a-fA-F]+(?=\s|$).

中的第二个数字string是一个十六进制数字，前面没有 a 0x。如果确定只有十六进制数字以 a 开头0x，那么它可以是(0x[0-9a-fA-F]+)|\d+。

有一个问题：如果你不在0x一个十六进制数之前放一个 a ，模式最终可能会匹配一个英文单词，比如,coffee等。你应该在十六进制值之前放一个 a 。cafedead0x

第二个可以很简单(\s)+。当你在 and 之间放置一个正则表达式时[，]里面的所有字符都被认为是不同的。表示或[ab]的单个实例。这就是为什么您的正则表达式用两次替换s 和s的原因。ab(\s)*[^\s]

python - Replace found items with regexp used.

4 回答 4

Related

Reference