I am working on a program that allows me to perform diagnostics and extract data from files that contain hundreds of excel sheets. I am using the DataNitro add in.
My program has two phases:
PhaseI: The diagnostic: the program reads every cell in certain user input columns on every sheet, and returns dictionaries and lists into an excel tab. An example of the lists are: starting/ending rows of the data to be extracted. Lists of sheets and observations that are missing, or have unusual values etc. These objects have two purposes 1) To allow the user to identify values that should be checked before the data are extracted and 2) to describe the data structure of each sheet.
PhaseII: The data extraction program used the objects created in Phase one, to extract the data from the individual sheets, and to compile it on a new sheet.
The program works wonderfully, and I am able to extract thousands of observations very quickly (when you understand that the previous incumbent of my job was doing it by hand!!!).
So here is the question:
In order for Phase II to happen it uses the object created in phase one. This means, that once the diagnostic phase is over, and has been iterated a sufficient number of times by the user until he is happy that the data are ready to be extracted, the whole program has to be run again from the beginning. This is most definitely an inefficiency, especially as the diagnostic phase can take some minutes. So what I want to know is, is there some way to read the lists that are contained in cell values back into python as the object type that they are intended to be? If there is then, I can use the objects from Phase I as the they appear in the excel sheet, and take them directly to Phase II without having to rerun the phase I program.
for example: one of the dictionarys has a key that represents the sheet number, and an entry that is itself a list of dictionaries with keys of observation value, an entry that is a list on which rows that observation can be found...
{'Sheet1': [{'OBSERVATION4': [12, 13, 14]}, {'OBSERVATION12': [29, 31, 35]}], 'Sheet12': [{'OBSERVATION7': [6, 7, 9]}, {'OBSERVATION18': [40, 44, 48]}]}
In the excel cell this just appears as a string.
How can I read this type of complex object directly back into python???
Thanks