I have absolutly no experience with regular expressions and I need some help setting up one to match a string with. This is for phone number validation. I need to make sure that a string a user inputs has only upper case letters A-Z, numbers 0-9, open/close parentheses[()], and hyphens(-). I also don't know what string method I need to use either match or string.
1 回答
RegEx is explained poorly all over the web. I don't fault anyone for asking more general questions about it and this is different from the other post which is more do-it-form-me google-evasion than specific question. The characters you asked about:
[0-9] or \d
is a regular expression literal. This is preferable to useing the RegExp constructor because you end up having to escape your escape backslashes which gets real ugly.
You can actually use regEx literals in a lot of string methods, like replace, split, etc.
Without special characters following, any non-special character is about matching one character at that position in a string. Stuff in []
is a class and can match more than one KIND of character but only the character at that positions following the last position matched. You might [.- ] useful for identifying non-number characters for telephone numbers. You can also express ranges in character classes, e.g. [a-hA-H]
or [4-9]
But one str position at a time goes out the window when you start using the follow-up characters:
? - one or none
* - 0 or many
+ - 1 or more
Avoid the .
wildcard character. It is inefficient. For some reason that I suspect goes down all the way to implementation in assembly for efficiency's sake, it checks against every single possibility rather than the 1-2 teletype whitespace characters it actually doesn't represent and there is no honest use for on a computer. More importantly, the better-performing alternative is much more powerful and helpful. Negating character classes are much faster. [^<]*
represents 0 or more positions of anything that is NOT a < character.
Very handy stuff for XML/SGML-style parsing which in spite of what many on Stack have said, is perfectly feasible with regEx, which is no longer technically confined to "regular" languages. You have to be aware of what your looking with something that allows as much sloppiness as somebody else's HTML but that's just a 'duh' in my book.
Crockford warns against negating character classes in JSlint. Crockford is painfully wrong on that count. They are not only much more efficient, they also make it much easier to think through how to tokenize stuff. If there is a security risk, you can set explicit limits to the number of characters matched with {}
brackets, e.g. p{2,5}
- which matches two to five p chars or {5}
for exactly 5 or {,5}
for up to 5 or {5,}
at least 5 (I think - test those last two)
Other random stuff you should look up:
- ph or f - helpful for finding phish and fish (when a class won't do, basically)
- represents beginning of a string - think of as a condition for the next character more than a character itself. Yes, it also negates character classes.
- represents end of a string - same caveat as above but on the previous character.
- used to escape special symbols. Note: a lot of special symbols that have no meaning in character classes require no \
inside []
- These represent commonly used sets of characters. The first is pretty much all whitespace (js-style escapes typically have regEx equivalents) followed by w for word characters (class equivalent [a-zA-Z0-9_]
) and d for digits [0-9]
. Capitalize any of these for the exact opposite.
There's more, like back-references, and lookaheads whose use-case scenarios are worth knowing but this is the commonly used stuff I actually remember from regular experience (bwaahaahaa).
I assume you're looking for non US since you have that A-Z concern and I'm sure there's plenty of US phone-numbers regExes out there but I'd probably do something like this for US numbers:
/\(?\d{3}[)\-. ]?\d{3}[\-. ]?\d{4}/
to match: 123-456-7890
123 456 7890
But also perhaps messily allows:
...which I'm willing to live with for the sake of avoiding complexity. Resist the temptation to do it all with one expression. Sometimes it's much cleaner to eliminate trailing/leading whitespace for instance, and then hit something with an expression. Split and join methods are very powerful for tokenizing
If this goes like a usual regEx conversation, somebody will shortly point out something I missed in my pattern. So yeah, test 'em out on stuff. There's sites that let you set the expression and then just plug in characters to try and break them.