I've developed a regex that matches pstops page specifications. (Regex whitespace not significant.)
^(?:(?<modulo>\d+):)?
(?<pages>
(?<pagespec>
(?<pageno>-?\d+)
(?<rotation>[RUL]?)?
(?:@(?<scale>\d*(?:\.\d+)))?
(?:\(
(?<xoff>\d*\.?\d+)(?<xunit>in|cm|w|h)?
,
(?<yoff>\d*\.?\d+)(?<yunit>in|cm|w|h)?
\))?
\+?)+,?
)+$
.
'Sample string:
'"4:1L@.7(21cm,0)+-2L@.7(21cm,14.85cm),1L(21cm,0)+-2L@.7(21cm,14.85cm)"
As you can see, there are nested named subgroups. A pagespec
need not specify rotation
, for example. I would like to be able to do something to the effect of this:
If match.Groups("pages").Captures(0).Groups("pagespecs").Captures(1).Groups("rotation").Value > ""
but of course Captures
has no Groups
property. Is there any way to access subgroups in the hierarchy in this way?
EDIT: Here is a more minmal example (white space significant this time):
(?<paragraph>(?:(?<sentence>The (?<child>boy|girl) is hungry\.|The (?<parent>mother|father) is angry\.)\s*)+)
Matched against this string:
The boy is hungry. The mother is angry. The girl is hungry.
produces one match. Within that match,
Groups("paragraph")
has one capture matching the entire string.Groups("sentence")
has three captures.Groups("child")
has two captures,boy
andgirl
.Groups("parent")
has one capture,mother
.
But there is nothing that tells me that the single capture for parent
lies within the second capture for sentence
, unless I start looking at Index
and Length
for each capture.
EDIT: Here's the final answer:
^(?:(?<modulo>\d+):)?
(?<pages>
(?<pagespec>
(?<pageno>-?\d+)
(?<rotation>[RUL]?)
(?:@(?<scale>\d*(?:\.\d+)))?
(?:\(
(?<xoff>\d*\.?\d+)(?<xunit>in|cm|w|h)?
,
(?<yoff>\d*\.?\d+)(?<yunit>in|cm|w|h)?
\))?
(?<pageno>)(?<rotation>)(?<scale>)(?<xoff>)(?<xunit>)(?<yoff>)(?<yunit>)
\+?)+,?
(?<pagespec>)
)+
This pushes a NULL
onto the pagespec
stack between each page
, so they can be correlated with page
; and a NULL
onto each of the other named stacks between each pagespec
. Gee, parsing is hard ...