Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?
21 回答
You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.
Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.
With Perl 5.10, use re 'debug';
. (Or debugcolor
, but I can't format the output properly on Stack Overflow.)
$ perl -Mre=debug -e'"foobar"=~/(.)\1/' Compiling REx "(.)\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: CLOSE1 (6) 6: REF1 (8) 8: END (0) minlen 1 Matching REx "(.)\1" against "foobar" 0 <> <foobar> | 1:OPEN1(3) 0 <> <foobar> | 3:REG_ANY(4) 1 <f> <oobar> | 4:CLOSE1(6) 1 <f> <oobar> | 6:REF1(8) failed... 1 <f> <oobar> | 1:OPEN1(3) 1 <f> <oobar> | 3:REG_ANY(4) 2 <fo> <obar> | 4:CLOSE1(6) 2 <fo> <obar> | 6:REF1(8) 3 <foo> <bar> | 8:END(0) Match successful! Freeing REx: "(.)\1"
Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x
modifier. With pcre
, there is the PCRE_EXTENDED
flag.
"foobar" =~ /
(.) # any character, followed by a
\1 # repeat of previously matched character
/x;
pcre *pat = pcre_compile("(.) # any character, followed by a\n"
"\\1 # repeat of previously matched character\n",
PCRE_EXTENDED,
...);
pcre_exec(pat, NULL, "foobar", ...);
I'll add another so that I don't forget it : debuggex
It's good because it's very visual:
When I get stuck on a regex I usually turn to this: https://regexr.com/
Its perfect for quickly testing where something is going wrong.
I use Kodos - The Python Regular Expression Debugger:
Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...).
(...)
Runs on Linux, Unix, Windows, Mac.
I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method. It will be much more readable and maintainable.
There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.
I've just seen a presentation of Regexp::Debugger by its creator: Damian Conway. Very impressive stuff: run inplace or using a command line tool (rxrx), interactively or on a "logged" execution file (stored in JSON), step forward and backward at any point, stop on breakpoints or events, colored output (user configurable), heat maps on regexp and string for optimization, etc...
Available on CPAN for free: http://search.cpan.org/~dconway/Regexp-Debugger/lib/Regexp/Debugger.pm
I use this online tool to debug my regex:
But yeah, it can't beat RegexBuddy.
I debug my regexes with my own eyes. That's why I use /x
modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.
As for me I usually use pcretest utility which can dump the byte code of any regex, and usually it is much more easier to read (for me at least). Example:
PCRE version 8.30-PT1 2012-01-01
re> /ab|c[de]/iB
------------------------------------------------------------------
0 7 Bra
3 /i ab
7 38 Alt
10 /i c
12 [DEde]
45 45 Ket
48 End
------------------------------------------------------------------
If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).
If you're a Mac user, I just came across this one:
http://atastypixel.com/blog/reginald-regex-explorer/
It's free, and simple to use, and it's been a great help for me to get to grips with RegExs in general.
Have a look at the (non-free) tools on regular-expressions.info. RegexBuddy in particular. Here is Jeff Atwood's post on the subject.
Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.
The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.
The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.
I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.
In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.
I often use Ruby based regexp tester Rubular
and also in Emacs use M-x re-builder
Firefox also has a useful extension
I use the Rx Toolkit included with ActiveState Komodo.
You could try this one http://www.pagecolumn.com/tool/regtest.htm
For me, after having eyeballed the regex (as I'm fairly fluent, and nearly always use /x or equivalent), I might debug rather than test if I am unsure if I would hit some degenerate matching (i.e. something that excessively backtracks) to see if I could solve such issues by modifying the greedyness of an operator for example.
To do that, I'd use one of the methods mentioned above: pcretest, RegexBuddy (if my current workplace has licensed it) or similar, and sometimes I time it in Linqpad if I'm working in C# regexes.
(The perl trick is a new one for me, so will probably add that to my regex toolkit too.)