regex - How do you debug a regex?

Question

Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?

score 71 · Accepted Answer

You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.

alt text

Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.

score 52 · Accepted Answer

With Perl 5.10, use re 'debug';. (Or debugcolor, but I can't format the output properly on Stack Overflow.)

$ perl -Mre=debug -e'"foobar"=~/(.)\1/'
Compiling REx "(.)\1"
Final program:
   1: OPEN1 (3)
   3:   REG_ANY (4)
   4: CLOSE1 (6)
   6: REF1 (8)
   8: END (0)
minlen 1
Matching REx "(.)\1" against "foobar"
   0 <> <foobar>             |  1:OPEN1(3)
   0 <> <foobar>             |  3:REG_ANY(4)
   1 <f> <oobar>             |  4:CLOSE1(6)
   1 <f> <oobar>             |  6:REF1(8)
                                  failed...
   1 <f> <oobar>             |  1:OPEN1(3)
   1 <f> <oobar>             |  3:REG_ANY(4)
   2 <fo> <obar>             |  4:CLOSE1(6)
   2 <fo> <obar>             |  6:REF1(8)
   3 <foo> <bar>             |  8:END(0)
Match successful!
Freeing REx: "(.)\1"

Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x modifier. With pcre, there is the PCRE_EXTENDED flag.

"foobar" =~ /
    (.)  # any character, followed by a
    \1   # repeat of previously matched character
/x;

pcre *pat = pcre_compile("(.)  # any character, followed by a\n"
                         "\\1  # repeat of previously matched character\n",
                         PCRE_EXTENDED,
                         ...);
pcre_exec(pat, NULL, "foobar", ...);

score 30 · Accepted Answer

30

I'll add another so that I don't forget it : debuggex

It's good because it's very visual: Photo of the Debuggex regex helper

于 2013-03-01T23:46:13.803 回答

score 28 · Accepted Answer

When I get stuck on a regex I usually turn to this: https://regexr.com/

Its perfect for quickly testing where something is going wrong.

score 19 · Accepted Answer

I use Kodos - The Python Regular Expression Debugger:

Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...).

(...)

Runs on Linux, Unix, Windows, Mac.

score 13 · Accepted Answer

I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method. It will be much more readable and maintainable.

score 12 · Accepted Answer

There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.

score 9 · Accepted Answer

I've just seen a presentation of Regexp::Debugger by its creator: Damian Conway. Very impressive stuff: run inplace or using a command line tool (rxrx), interactively or on a "logged" execution file (stored in JSON), step forward and backward at any point, stop on breakpoints or events, colored output (user configurable), heat maps on regexp and string for optimization, etc...

Available on CPAN for free: http://search.cpan.org/~dconway/Regexp-Debugger/lib/Regexp/Debugger.pm

score 7 · Accepted Answer

I use this online tool to debug my regex:

https://www.regextester.com/

But yeah, it can't beat RegexBuddy.

score 6 · Accepted Answer

I debug my regexes with my own eyes. That's why I use /x modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.

score 6 · Accepted Answer

As for me I usually use pcretest utility which can dump the byte code of any regex, and usually it is much more easier to read (for me at least). Example:

PCRE version 8.30-PT1 2012-01-01

  re> /ab|c[de]/iB
------------------------------------------------------------------
  0   7 Bra
  3  /i ab
  7  38 Alt
 10  /i c
 12     [DEde]
 45  45 Ket
 48     End
------------------------------------------------------------------

score 4 · Accepted Answer

I use:

http://regexlib.com/RETester.aspx

You can also try Regex Hero (uses Silverlight):

http://regexhero.net/tester/

score 3 · Accepted Answer

If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).

score 3 · Accepted Answer

If you're a Mac user, I just came across this one:

http://atastypixel.com/blog/reginald-regex-explorer/

It's free, and simple to use, and it's been a great help for me to get to grips with RegExs in general.

score 2 · Accepted Answer

Have a look at the (non-free) tools on regular-expressions.info. RegexBuddy in particular. Here is Jeff Atwood's post on the subject.

score 2 · Accepted Answer

Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.

The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.

The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.

score 2 · Accepted Answer

I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.

In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.

score 2 · Accepted Answer

I often use Ruby based regexp tester Rubular

and also in Emacs use M-x re-builder

Firefox also has a useful extension

score 2 · Accepted Answer

2

I use the Rx Toolkit included with ActiveState Komodo.

于 2011-04-03T22:28:26.970 回答

score 1 · Accepted Answer

1

You could try this one http://www.pagecolumn.com/tool/regtest.htm

于 2010-02-27T19:58:54.213 回答

score 0 · Accepted Answer

For me, after having eyeballed the regex (as I'm fairly fluent, and nearly always use /x or equivalent), I might debug rather than test if I am unsure if I would hit some degenerate matching (i.e. something that excessively backtracks) to see if I could solve such issues by modifying the greedyness of an operator for example.

To do that, I'd use one of the methods mentioned above: pcretest, RegexBuddy (if my current workplace has licensed it) or similar, and sometimes I time it in Linqpad if I'm working in C# regexes.

(The perl trick is a new one for me, so will probably add that to my regex toolkit too.)

regex - How do you debug a regex?

21 回答 21

Related

Reference