I have data similar to this post: pandas: Filling missing values within a group
That is, I have data in a number of observation sessions, and there is a focal individual for each session. That focal individual is only noted once, but I want to fill in the focal ID data for each line during that session. So, the data look something like this:
Focal Session
0 NaN 1
1 50101 1
2 NaN 1
3 NaN 2
4 50408 2
5 NaN 2
Based on the post linked above, I was using this code:
g = data.groupby('Session')
g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
But this returns a KeyError (specifically, KeyError:None). According to the .loc documentation, KeyErrors can result when the data isn't found. So, I've checked and while I have 152 sessions, I only have 150 non-null data points in the Focal column. Before I decide to manually search my data for which of the sessions is missing a Focal ID, I have two questions:
I am very much a beginner. So is this a reasonable explanation for why I am getting a KeyError?
If it is reasonable, is there a way to figure out which Session is missing Focal ID data, that will save me from manually looking through the data?
Output here:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-330-0e4f27aa7e14> in <module>()
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
1540 for name, group in self:
1541 object.__setattr__(group, 'name', name)
-> 1542 res = wrapper(group)
1543 # result[group.index] = res
1544 indexer = self.obj.index.get_indexer(group.index)
//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in <lambda>(x)
1536 wrapper = lambda x: getattr(x, func)(*args, **kwargs)
1537 else:
-> 1538 wrapper = lambda x: func(x, *args, **kwargs)
1539
1540 for name, group in self:
<ipython-input-330-0e4f27aa7e14> in <lambda>(s)
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
669 return self._getitem_tuple(key)
670 else:
--> 671 return self._getitem_axis(key, axis=0)
672
673 def _getitem_axis(self, key, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
756 return self._getitem_iterable(key, axis=axis)
757 else:
--> 758 return self._get_label(key, axis=axis)
759
760 class _iLocIndexer(_LocationIndexer):
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
58 return self.obj._xs(label, axis=axis, copy=False)
59 except Exception:
---> 60 return self.obj._xs(label, axis=axis, copy=True)
61
62 def _get_loc(self, key, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
570
571 def _xs(self, key, axis=0, level=None, copy=True):
--> 572 return self.__getitem__(key)
573
574 def _ixs(self, i, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
611 def __getitem__(self, key):
612 try:
--> 613 return self.index.get_value(self, key)
614 except InvalidIndexError:
615 pass
//anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
761 """
762 try:
--> 763 return self._engine.get_value(series, key)
764 except KeyError, e1:
765 if len(self) > 0 and self.inferred_type == 'integer':
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2565)()
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2380)()
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3166)()
KeyError: None