这是一种方法。我会添加两列“idx”和“max”
In [452]: df['idx'] = df.groupby(['issn']).apply(lambda sdf: (sdf.volume - 1) * sdf.issue.max() + sdf.issue)
In [453]: df
Out[453]:
issn year volume issue idx
0 1234-x000 2013 1 2 2
1 1234-x000 2013 1 1 1
2 1234-x000 2012 6 2 12
3 1234-x000 2012 6 1 11
4 1234-x000 2012 5 2 10
5 4321-yyyy 2013 2 1 2
6 4321-yyyy 2013 1 1 1
7 4321-yyyy 2012 12 1 12
8 4321-yyyy 2012 11 1 11
In [454]: df['max'] = df.groupby(['issn']).idx.transform(lambda s: s.max())
In [455]: df
Out[455]:
issn year volume issue idx max
0 1234-x000 2013 1 2 2 12
1 1234-x000 2013 1 1 1 12
2 1234-x000 2012 6 2 12 12
3 1234-x000 2012 6 1 11 12
4 1234-x000 2012 5 2 10 12
5 4321-yyyy 2013 2 1 2 12
6 4321-yyyy 2013 1 1 1 12
7 4321-yyyy 2012 12 1 12 12
8 4321-yyyy 2012 11 1 11 12
上一个答案提供了其余的
In [462]: df.groupby(['issn', 'year']).apply(lambda sdf: np.setdiff1d(range(1, sdf['max'].irow(0)), sdf.idx).tolist())
Out[462]:
issn year
1234-x000 2012 [1, 2, 3, 4, 5, 6, 7, 8, 9]
2013 [3, 4, 5, 6, 7, 8, 9, 10, 11]
4321-yyyy 2012 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
2013 [3, 4, 5, 6, 7, 8, 9, 10, 11]