python - 正则表达式 - python 2.6 和 3.3 中的不同输出

Question

当我为正则表达式执行相同的代码时，我在 python 2 和 3 中得到不同的输出。

假设这是我想要的位于网页某处的数据。

source = ['\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e',
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e', 
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e', 
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e']

因此，当我在 python 2.6 中运行以下代码时，它运行良好。我得到了像上面这样的精确输出。

match = re.findall("\x1e\x1e\S+",source)

但是当我在 python 3.3 中执行它时，例如：

match = re.findall("\x1e\x1e\S+", str(source))

我得到了 match 变量的输出，例如：

['\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log','\x1e\x1e5.5.30-log']

那么，你能告诉我为什么它没有在 python 3 中使用整个字符串吗？为什么\x1epcofiowa@localhost\x1epcofiowa_pci\x1e每次都跳过？我想要像 python 2.6 这样的输出。

所以，我此刻一无所知。我在等待你的答复。谢谢。

score 3 · Accepted Answer

似乎在和\S中表现不同。Python 2Python 3

根据Python 3 re 模块文档：-

\S- 匹配任何不是 Unicode 空白字符的字符。这与\s. 如果ASCII flag使用了 this 则等效于[^ \t\n\r\f\v]（但该标志会影响整个正则表达式，因此在这种情况下使用显式[^ \t\n\r\f\v]可能是更好的选择）。

现在，因为\x1e(相当于U+001E, 在 your 之后出现的\x1e\x1e5.5.30-log是对 activestate 的unicode whitespace字符引用\S，所以它在 Python 3中不匹配 by 。

而在Python 2中：-

\S- 匹配任何非空白字符；这相当于类[^ \t\n\r\f\v]。

So, it only considers the ASCII character set for matching non-whitespace, and hence it matches \x1e.

python - 正则表达式 - python 2.6 和 3.3 中的不同输出

1 回答 1

Related

Reference