3

我对 Python 中的外观有疑问:

>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"    
>>> q = spacereplace.sub(" and ", q)
>>> q
# What is meant to happen:
'a and b and (c or d)'

# What instead happens
'a and b and (c and or and d)'

正则表达式应该匹配任何不在单词“and”或“or”旁边的空格,但这似乎不起作用。

谁能帮我这个?

编辑:作为对评论者的回应,我将正则表达式分解为多行。

(?<!\band) # Looks behind the \s, matching if there isn't a word break, followed by "and", there.
(?<!\bor)  # Looks behind the \s, matching if there isn't a word break, followed by "or", there.
\s         # Matches a single whitespace character.
(?!or\b)   # Looks after the \s, matching if there isn't the word "or", followed by a word break there.
(?!and\b)  # Looks after the \s, matching if there isn't the word "and", followed by a word break there.
4

1 回答 1

2

您大概将原始字符串修饰符rb.

>>> import re
>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"
>>> spacereplace.sub(" and ", q)
'a and b and (c or d)' 

有时,如果正则表达式不起作用,它可能会帮助DEBUG它使用re.DEBUG标志。在这种情况下,您可能会注意到,\b没有检测到单词边界,这可能会提示在哪里搜索错误:

>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
  literal 8
  literal 97
  literal 110
  literal 100
assert_not -1
  literal 8
  literal 111
  literal 114
in
  category category_space
assert_not 1
  literal 111
  literal 114
  literal 8
assert_not 1
  literal 97
  literal 110
  literal 100
  literal 8


>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
  at at_boundary
  literal 97
  literal 110
  literal 100
assert_not -1
  at at_boundary
  literal 111
  literal 114
in
  category category_space
assert_not 1
  literal 111
  literal 114
  at at_boundary
assert_not 1
  literal 97
  literal 110
  literal 100
  at at_boundary
于 2013-05-04T11:21:13.823 回答