.net - 允许字母数字、最多一个空格等的正则表达式

Question

我正在打开这个与另一个非常相似的线程，但我无法解决问题：我有一个输入字段，允许一个带有可选唯一空格的字母数字字符串作为分隔符，然后是一个可选的其他字母数字字符串等... .我发现这个正则表达式：

^([0-9a-zA-z]+ ?)*$

有用！但是，一旦我在一个长句子中有 2 个连续的空格并且这 2 个空格在句子中的位置很远，性能真的很差。在下面的示例中，如果我将 2 个空格放在句子的开头，则结果在半秒内就可以了。但如果位置较远，它会持续 10 秒或更长时间。

dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zo999 ddzdfff ttttt zoddzdfff ttttt zoddzdff

2 个空格在999. 您对改进这个正则表达式有什么想法或建议吗？

谢谢并恭祝安康

PF

ps：您可以在字符串中输入无效字符后立即检查问题，而不是特别是2个空格。

编辑：另一个例子：12345678901234567890' ==> 20 char。+ 1 个无效字符。=> 结果是立即添加 5 个有效字符。执行正则表达式持续 5 秒！1234567890123456789012345'

score 1 · Accepted Answer

I suggest changing the expression to something like this:

(?i)^[0-9a-z]+(?:\s[0-9a-z]+)*$

enter image description here

This is functionally similar in that it'll match all alphanumeric characters which are delimited by a single space. A major difference is that I moved the initial word check to the front of the expression, then made a non capture group (?:...) for the remaining space delimited words.

Non capture groups (?:...) are faster then capture groups (...) because the regex engine doesn't need to retain matched values. And by moving the space \s to the front of the word group on repeat words the engine doesn't need to validate the first character in the group is included in the character class.

You also have a typo in your character class [0-9a-zA-z] the last z should probably be upper case. This A-z format will likely have some odd unexpected results. In my expression I simply added a (?i) to the beginning to force the regex engine to shift into case insensitive mode, and I dropped the character class to [0-9a-z].

In my testing I see that your expression ^([0-9a-z]+ ?)*$ takes about 0.03 seconds to process your sample text with 2 extra spaces toward the end. My recommended expression completes the same test in about 0.000022 seconds. WOW that's an amazing delta.

score 0 · Accepted Answer

这是一个使用\w( word class ) 的更简单的正则表达式：

^([\w]+(\s*))$

测试

它在 JavaSript 中是瞬时的

var input = "dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt  zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt  zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt  zoddzdfff ttttt zo999  ddzdfff ttttt zoddzdfff ttttt zoddzdff";

var re = /([\w]+(\s*))/g;

console.log(input.replace(re, "boo"));

.net - 允许字母数字、最多一个空格等的正则表达式

2 回答 2

测试

Related

Reference