So I have input coming in as follows: 12_34 5_6_8_2 4_____3 1234
and the output I need from it is: 1234, 5682, 43, 1234
I'm currently working with r'[0-9]+[0-9_]*'.replace('_','')
, which, as far as I can tell, successfully rejects any input which is not a combination of numeric digits and under-scores, where the underscore cannot be the first character.
However, replacing the _ with the empty string causes 12_34 to come out as 12 and 34.
Is there a better method than 'replace' for this? Or could I adapt my regex to deal with this problem?
EDIT: Was responding to questions in comments below, I realised it might be better specified up here. So, the broad aim is to take a long input string (small example: "12_34 + 'Iamastring#' I_am_an_Ident" and return: ('NUMBER', 1234), ('PLUS', '+'), ('STRING', 'Iamastring#'), ('IDENT', 'I_am_an_Ident') I didn't want to go through all that because I've got it all working as specified, except for number. The solution code looks something like: tokens = ('PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'IDENT', 'STRING', 'NUMBER') t_PLUS = "+" t_MINUS = '-' and so on, down to: t_NUMBER = ###code goes here I'm not sure how to put multi-line processes into t_NUMBER