python - Python 正则表达式 - r 前缀

Question

r当不使用前缀时，谁能解释为什么下面的示例 1 有效？我认为r每当使用转义序列时都必须使用前缀。示例 2 和示例 3 证明了这一点。

# example 1
import re
print (re.sub('\s+', ' ', 'hello     there      there'))
# prints 'hello there there' - not expected as r prefix is not used

# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there'))
# prints 'hello     there' - as expected as r prefix is used

# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello     there      there'))
# prints 'hello     there      there' - as expected as r prefix is not used

score 92 · Accepted Answer

因为\只有当它们是有效的转义序列时才开始转义序列。

>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'


>>> print r'\n'
\n
>>> '\s'
'\\s'
>>> r'\s'
'\\s'
>>> print '\s'
\s
>>> print r'\s'
\s

除非存在 'r' 或 'R' 前缀，否则字符串中的转义序列将根据与标准 C 使用的规则类似的规则进行解释。可识别的转义序列是：

Escape Sequence   Meaning Notes
\newline  Ignored  
\\    Backslash (\)    
\'    Single quote (')     
\"    Double quote (")     
\a    ASCII Bell (BEL)     
\b    ASCII Backspace (BS)     
\f    ASCII Formfeed (FF)  
\n    ASCII Linefeed (LF)  
\N{name}  Character named name in the Unicode database (Unicode only)  
\r    ASCII Carriage Return (CR)   
\t    ASCII Horizontal Tab (TAB)   
\uxxxx    Character with 16-bit hex value xxxx (Unicode only) 
\Uxxxxxxxx    Character with 32-bit hex value xxxxxxxx (Unicode only) 
\v    ASCII Vertical Tab (VT)  
\ooo  Character with octal value ooo
\xhh  Character with hex value hh

永远不要依赖原始字符串作为路径文字，因为原始字符串有一些相当奇特的内部工作原理，众所周知，它们会咬人的屁股：

当存在“r”或“R”前缀时，反斜杠后面的字符将不加更改地包含在字符串中，并且所有反斜杠都保留在字符串中。例如，字符串文字r"\n"由两个字符组成：一个反斜杠和一个小写的“n”。字符串引号可以用反斜杠转义，但反斜杠保留在字符串中；例如，r"\""是由两个字符组成的有效字符串文字：反斜杠和双引号；r"\"不是有效的字符串文字（即使是原始字符串也不能以奇数个反斜杠结尾）。具体来说，原始字符串不能以单个反斜杠结尾（因为反斜杠会转义后面的引号字符）。另请注意，后跟换行符的单个反斜杠被解释为这两个字符作为字符串的一部分，

为了更好地说明最后一点：

>>> r'\'
SyntaxError: EOL while scanning string literal
>>> r'\''
"\\'"
>>> '\'
SyntaxError: EOL while scanning string literal
>>> '\''
"'"
>>> 
>>> r'\\'
'\\\\'
>>> '\\'
'\\'
>>> print r'\\'
\\
>>> print r'\'
SyntaxError: EOL while scanning string literal
>>> print '\\'
\

score 41 · Accepted Answer

'r' 表示以下是“原始字符串”，即。反斜杠字符按字面意思处理，而不是表示对后面字符的特殊处理。

http://docs.python.org/reference/lexical_analysis.html#literals

所以'\n'是一个换行符
并且r'\n'是两个字符 - 一个反斜杠和字母“n”
另一种写法是'\\n'因为第一个反斜杠转义了第二个

写这个的等效方式

print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there'))

是

print (re.sub('(\\b\\w+)(\\s+\\1\\b)+', '\\1', 'hello     there      there'))

由于 Python 处理不是有效转义字符的字符的方式，并非所有这些双反斜杠都是必需的 - 例如，但对于and'\s'=='\\s'而言并非如此。我的偏好是明确并将所有反斜杠加倍。'\b''\\b'

score 6 · Accepted Answer

并非所有涉及反斜杠的序列都是转义序列。\t并且\f是，例如，但\s不是。在非原始字符串文字中，任何\不属于转义序列的内容都被视为另一个\：

>>> "\s"
'\\s'
>>> "\t"
'\t'

\b 但是，是一个转义序列，因此示例 3 失败。（是的，有些人认为这种行为相当不幸。）

score 1 · Accepted Answer

1

试试看：

a = '\''
'
a = r'\''
\'
a = "\'"
'
a = r"\'"
\'

于 2019-07-29T09:39:05.683 回答

score 0 · Accepted Answer

检查以下示例：

print r"123\n123" 
#outputs>>>
123\n123


print "123\n123"
#outputs>>>
123
123

python - Python 正则表达式 - r 前缀

5 回答 5

Related

Reference