I just learned about re.Scanner while looking for ways to parse a series of lines that could be a bit flexible in definition. It looks (not knowing what it's meant to do) like it's exactly what I want, but I'm having an issue.
I define my scanner:
scanner = re.Scanner([
(r"([0-9]+(?:\ h|h))", lambda scanner,token:("HOURS", token)),
])
results, remainder = scanner.scan(line)
which should be able to find something like '1h' or '1 h' in the supplied string. But, this only works if the hour is at the beginning of the string.
Passing in:
1 h words words words
bla 2 h words words
only the first entry gets parsed as an hour. Without being able to read up on Scanner, I thought it would be able to find a match anywhere in the supplied string, but it looks like it's just at the beginning. It also seems to ignore a lot of the standard regex controls (like () for capturing and (?:) for non capturing.
Should I be looking somewhere else? Is it a bad idea to use a class that doesn't look like it's going to make it into the official version of Python?