I suggest:
Split on the @
symbol. Split on the last period (using String#substring
and String#lastIndexOf
). Now you have the local part, the domain and the TLD all in separate strings, use if-statements to validate. If there are any rules applicable to all (2 consecutive periods?), do that before splitting. Much simpler to get right, much simpler to understand, much simpler to maintain.
But, if you really want to stick to regex, here's a few things I've seen:
The [^.]
before the @
should be (?<!\.)
, otherwise the last character before the @
can be just about anything.
.
is just one character, so (?!.\-{2,}.)
and (?!.\.{2,}.)
does not do what you think it does. Just making it .*
seems to fix it. And you don't need to check any characters after the things you're looking for.
It hasn't been explicitly stated, but I presume the domain and TLD can't contain 2 successive periods either. If this is allowed, the first part of the regex needs to be (?!.*\.{2,}.*@)
to stop at the @
.
If you use String#matches
, the ^
and $
isn't required.
There's some unneeded ()
's.
Final regex:
(?!.*\.{2,})[^.][-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+(?<!\.)@(?!.*\-{2,})[^-_][-a-zA-Z0-9_.]+[^-_]\.[a-zA-z]+
If you choose to stick to regex, I suggest extensive commenting:
String regex =
"(?!.*\\.{2,})" // doesn't contain 2 consecutive .'s
// local part
+ "[^.]" // doesn't start with a .
+ "[-a-zA-Z0-9_.!@#$%^&*(),'+=`{|}~-]+" // valid chars for local part
+ "(?<!\\.)" // last char of local part isn't a .
// at symbol
+ "@"
// domain
...
It might seem like overkill, but you'll wish you had if you try to maintain it a few months down the line, especially if you haven't touched any regex in those months.