regex - Meaning of a regular expression

Question

I am trying to understand the regular expression language, but it is so difficult.

I have read some tutorials but i don't really get it.

I got this regex,

N\b*:\b*[^:]*

can someone please tell me what this regular expression means?

Thank you very much !

score 2 · Accepted Answer

Breaking the regex down to its components we have:

N
\b*
:
\b*
[^:]*

N and : are just literals. Nothing to say about those.

\b is the word boundary pseudo class. It will match either the beginning of the string, its end, or a word boundary. A word boundary is the beginning or end of a word. This is a bit weird because it matches ("consumes") no characters. In the string "foo bar" there are 4 word boundaries (before f, after o, before b, after r).

A * means that the previous match can be repeated any number of times (0, 1, 2 or more). This means you're accepting any number of consecutive word boundaries.

Finally the brackets [ ] define a class. Inside this class there is ^:. The ^ means "inverse". For example if you have a class [a] it will match the character a. But [^a] will match everything except a. So the class [^:] will match everything except :. Finally we have a * again meaning you can match this class any number of times.

So putting everything together here's what the regex means:

match the letter N
match any number of word boundaries
match the character :
match any number of word boundaries
match any number of characters except :.

Here are a few examples:

N: - matches, it's the simplest match
N - doesn't match, there's no :
N:foobar - matches
N:foobar:baz - doesn't match, the second : is not allowed.

This whole word boundaries business is not very intuitive and it isn't clear without context what is meant here. Matching word boundaries around the : doesn't make much sense. But at least you should be able to understand the regex better already.

score 1 · Accepted Answer

I'd recommend you using debuggex.com

N\b*:\b*[^:]*

Regular expression image

Where

N, : are literals
\b represents a word start/end

Edit live on Debuggex

Hint: After getting a tiny bit familiar with the basics, I'd say:

Let's play a game:

Put your left hand finger on the black dot, your right finger on the first character of the string to be matched and try to reach the white point with your left finger.

The rules are:

you can only pass a through a rectangle if the character you are currently is matched by it
once you go through a rectangle you have to advance your right hand finger by one character
you are not allowed to go backwards (neither hands)
- exceptions are the loops (the ones below the line joining the black & white dots)
if you reach the white dot you have a match

score 0 · Accepted Answer

Some context may be useful for a more specific description of its function generally, but breaking this particular regular expression down:

N - The letter 'N'.
\b* - Zero or more word boundaries (that is, it matches the end of a word.)
: - A colon.
\b* - Zero or more word boundaries again.
[^:]* - A series of characters until either the end of the line, or a : is reached.

In the string

LMN:  Testing:123

this would match

N:  Testing

regex - Meaning of a regular expression

3 回答 3

Related

Reference