I'm working on a data mining problem for my Master Thesis. I'm using Python for data analysis, but I have no experience with Pandas, which is needed to convert my data to a Dataframe. In order to do Survival Regression with a Python package called Lifelines I need to create a Covariate Matrix from my experiment_data dict containing over 16k of dicts with Twitter data about Kickstarter projects (see example dict below).
16041: {'goal': 1200, 'launch': 1353544772, 'days-before-deadline': 3, 'followers': 149, 'date-funded': 1355887690.9189188, 'id': 52687, 'tweet_ids': [280965208409796608, ... n], 'state': 1, 'deadline': 1356136772, 'retweets': 0, 'favorites': 0, 'duration': 31, 'timestamps': [1355876412.0], 'favourites': 0, 'runtime': 27, 'friends': 127, 'pledges': [0.0, 0.0625, 0.0625, ... n], 'statuses': 7460}
If I create a Pandas Dataframe from this dict, I'll be able to create a Covariate Matrix by using Patsy, for example like this:
X = patsy.dmatrix('friends + followers + retweets, favorites -1', data, return_type='dataframe')
Now my question is how to create a Pandas Dataframe from the experiment_data dicts? The keys of the inner dictionaries (goal, launch, followers, etc.) should be columns for each Kickstarter project (i.e. index nr.: 0 to 16041).
Any help would be really appreciated. Thanks in advance!
P.S. If you have experience in Survival Regression using Python and Lifelines, please let me know!