I have a 10 000 lines source code with tons of duplication. So I read in the file as text.
Example:
assert PyArray_TYPE(real0) == np.NPY_DOUBLE, "real0 is not double"
assert real0.ndim == 1, "real0 has wrong dimensions"
if not (PyArray_FLAGS(real0) & np.NPY_C_CONTIGUOUS):
real0 = PyArray_GETCONTIGUOUS(real0)
real0_data = <double*>real0.data
I want to replace all occurances of this pattern with
real0_data = _get_data(real0, "real0")
where real0 can be any variable name [a-z0-9]+
So don't get confused by the source code. The code doesn't matter, this is text processing and regex.
This is what I have so far:
PATH = "func.pyx" source_string = open(PATH,"r").read() pattern = r""" assert PyArray_TYPE\(([a-z0-9]+)\) == np.NPY_DOUBLE, "([a-z0-9]+) is not double" assert ([a-z0-9]+).ndim == 1, "([a-z0-9]+) has wrong dimensions" if not (PyArray_FLAGS(([a-z0-9]+)) & np.NPY_C_CONTIGUOUS): ([a-z0-9]+) = PyArray_GETCONTIGUOUS(([a-z0-9]+)) ([a-z0-9]+)_data = ([a-z0-9]+).data"""