I have a DataFrame that includes a column where every cell is made up of a list of dicts, and each list of dicts is of varying length (including 0).
An example:
df = pd.DataFrame({'ID' : [13423,294847,322844,429847], 'RANKS': [[{u'name': u'A', u'price': u'$1.00', u'rank': u'1'},
{u'name': u'B', u'price': u'$4.00', u'rank': u'2'},
{u'name': u'C', u'price': u'$3.99', u'rank': u'3'},
{u'name': u'D', u'price': u'$2.00', u'rank': u'4'},
{u'name': u'E', u'price': u'$2.50', u'rank': u'5'}],
[{u'name': u'AA', u'price': u'$1.99', u'rank': u'1'},
{u'name': u'BB', u'price': u'$6.99', u'rank': u'2'}],
[{u'name': u'Z', u'price': u'$0.99', u'rank': u'1'},
{u'name': u'Y', u'price': u'$10.00', u'rank': u'2'},
{u'name': u'X', u'price': u'$1.99', u'rank': u'3'}],[]], 'count' : [5,2,3,0]})
Note that 'count' is the number of dicts in 'RANKS.' The goal I had in mind was to creating a series of additional dataframes/tables (one for each 'rank') and link these to the main table in a HDFStore. Something like:
Rank_2
ID Price Name
13423 $4.00 B
294847 $6.99 BB
322844 $10.99 Y
429847 NaN NaN
Rank_3
ID Price Name
13423 $3.99 C
294847 NaN NaN
322844 $1.99 X
429847 NaN NaN
This way I could easy query on ID and rank if needed, but the main table wouldn't get cluttered with the unwinding of this hierarchical data.
The problem, however, is I cannot figure out how to create the DataFrames from this column. I've tried a number of things, the first (to be nested in a for-loop if it worked, but of course it did not):
Rank_1 = pd.DataFrame(df.loc[df['count'] > 0]['RANKS'].map(lambda x: pd.DataFrame(x[0])))
And, the second, since price is the most important piece to me:
for i in range(0,5):
df['rank_%s' % str(i+1)] = df[df['count'] > i]['RANKS'].map(lambda x: x[i]['price'].strip('$'))
Then convert to float. This works but is a pretty big compromise. Is there an efficient way (that won't get hung up on NaNs) to accomplish my goal of separate DataFrames for each rank?