regex - Snort/PCRE 正则表达式：奇数字符类语法

Question

当我解析 Snort 正则表达式集时，我发现了一个非常奇怪的字符类语法，比如[\x80-t]or [\x01-t\x0B\x0C\x0E-t\x80-t]，我不知道（真的不知道）-t是什么意思。我什至不知道它是标准 PCRE 还是某种 Snort 扩展。

下面是一些包含这些字符类的正则表达式：

/\x3d\x00\x12\x00..........(.[\x80-t]|...[\x80-t])/smiR
/^To\x3A[^\r\n]+[\x01-t\x0B\x0C\x0E-t\x80-t]/smi

PS：请注意，这\x80-t甚至不是标准方式的有效范围，因为字符t是\x74。

score 4 · Accepted Answer

这可能会引用不同的字符编码，其中t大于x80并且x80无法正常处理。

以 EBCDIC 扫描代码为例（请参阅此处以获取参考）。

（但我也不知道为什么有人要这样写）

对于 ASCII 我有一个疯狂的猜测：如果-t意味着“直到下一个标记 -1”或者如果放在最后一行“直到允许的字符结束”，第二个查询将说明：

To:(not a newline, more than one character)(not a newline)

所以基本上这个表达式的[\x01-t\x0B\x0C\x0E-t\x80-t]意思是[^\r\n]。

如果将其应用于(.Ç-t]|...[Ç-t])此将处理大于 7 位 ASCII 的任何字符，该字符也可以处理所有 unicode（除了前 127 个字符）。

（话虽如此，我仍然不知道为什么有人应该这样写，但至少除了“它是一个错误”之外，这是一个连贯的解释）

也许有帮助：如果您写出 \xYY，您发布的 reexes 是什么意思？ASCII：

/=\NULL\DEVICE_CONTROL_2\NULL\.{10}\(.Ç-t]|...[Ç-t])/smiR
/^To\:[^\r\n]+[\START_OF_HEADING-t\VERTICALTAB\FORMFEED\SHIFTOUT\Ç-t]/smi

照顾\0x12akaDevice control 2可能会有所帮助，因为这不会出现在文本中，但可能会出现在网络流量中。

score 3 · Accepted Answer

The second regex matches lines that begin with To: (case-insensitive) followed by at least one character that isn't a line feed or carriage return. Since this is a greedy match, I'd expect \r or \n to be the only possible terminating matches in the [\x01-t\x0B\x0C\x0E-t\x80-t] character class. Note: \r is equivalent to \x0D and \n is equivalent to \x0A. Not sure what -t means but let's pretend it was - instead. Then the character class would be [\x01-\x0B\x0C\x0E-\x80-], which is still a bit convoluted but would make a little bit more sense - i.e. allowing a \n as a terminating character but not \r.

This is a very long shot but is there any chance this could be some kind of search-and-replace gone wrong?! (Guess this can probably be quickly discounted if there are other regexes that have normal ranges without the t.)

regex - Snort/PCRE 正则表达式：奇数字符类语法

2 回答 2

Related

Reference