python - Python regex 模块中的简单大小写折叠与完整大小写折叠

Question

这是我要询问的模块：https ://pypi.org/project/regex/ ，它是 Matthew Barnett 的regex。

在项目描述页面中，V0 和 V1 之间的行为差异表示为（注意粗体部分）：

旧行为与新行为

为了与re模块兼容，该模块有两种行为：

版本 0 行为（旧行为，与 re 模块兼容）：

请注意，re 模块的行为可能会随着时间而改变，我将努力在版本 0 中匹配该行为。

VERSION0由orV0标志或(?V0)在模式中指示。

Unicode 中不区分大小写的匹配默认使用简单的大小写折叠。

版本 1 行为（新行为，可能与 re 模块不同）：

VERSION1由orV1标志或(?V1)在模式中指示。

默认情况下，Unicode 中不区分大小写的匹配使用完全大小写折叠。

如果未指定版本，则正则表达式模块将默认为regex.DEFAULT_VERSION.

我自己尝试了一些示例，但没有弄清楚它的作用：

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regex
>>> r = regex.compile("(?V0i)и")
>>> r
regex.Regex('(?V0i)и', flags=regex.I | regex.V0)
>>> r.search("И")
<regex.Match object; span=(0, 1), match='И'>
>>> regex.search("(?V0i)é", "É")
<regex.Match object; span=(0, 1), match='É'>
>>> regex.search("(?V0i)é", "E")
>>> regex.search("(?V1i)é", "E")

简单案例折叠和全案例折叠有什么区别？或者您能否提供一个示例，其中（不区分大小写）正则表达式匹配 V1 中的某些内容，但不匹配 V0 中的内容？

score 1 · Accepted Answer

它遵循Unicode 大小写折叠表。摘抄：

# The entries in this file are in the following machine-readable format:
#
# <code>; <status>; <mapping>; # <name>
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.

[...]

# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.

折叠仅对一些特殊字符有所不同，例如小和大写的拉丁尖 s：

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

[...]

1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S

python - Python regex 模块中的简单大小写折叠与完整大小写折叠

1 回答 1

Related