I'm working on a tool that parses python source code into a nice html file. Basically, it read a python file line by line, looks at the line to determine what's in it and then adds the right <span>
tags with colors, line breaks and whatnot.
I got the general structure of the program, now I'm making all the functions that actually read a string and return an HTML enriched string.
I'm stuck on parsing strings that have quotes in them ie.:
x = 'hello there'
if x == 'example "quotes" inside quotes' and y == 'another example':
My work so far has been enumerating a string to get the indices of single-quotes, return them as a list and then two while loops that put the right html tags in the right places. It seemed to work fine when there was a single quote in the string, but all hell broke loose when I introduced two quotes on a line, or quotes inside quotes or finally - a string made up of '\''.
It seems this route is a dead end. I'm now thinking of turning to .split()
, shlex
, or re
and breaking down the string into a list and trying to work with that.
I would really appreciate tips, pointers, and any advice.
Edit: Also, to make it clearer, I need to put HTML tags in the right places in a string. Working with string indices didn't give much results with more complex strings.