1

假设我有一个熊猫系列,我想取每组 8 行的平均值。我对系列的大小没有先验知识,索引可能不是从 0 开始的。我目前有以下

N = 8

s = pd.Series(np.random.random(50 * N))

n_sets = s.shape[0] // N

split = ([m * N for m in range(n_sets)],
         [m * N for m in range(1, n_sets + 1)])

out_array = np.zeros(n_sets)

for i, (a, b) in enumerate(zip(*split)):

    out_array[i] = s.loc[s.index[a:b]].mean()

有没有更短的方法来做到这一点?

4

1 回答 1

1

您可以尝试使用groupby, 通过将索引切片N(您可以在此处查看切片的解释),然后使用pd.Series.mean()

newout_array=s.groupby(s.index//N).mean().to_list()

输出:

out_array  #original solution
[0.42147899 0.55668055 0.5222594  0.46066426 0.44378491 0.52719371
 0.42479113 0.46485387 0.2800083  0.57174865 0.59207811 0.58665479
 0.52414851 0.38158931 0.51884761 0.59007469 0.3449512  0.56385373
 0.34359674 0.44524997 0.44175351 0.42339394 0.5687501  0.3140091
 0.40985639 0.46649486 0.3101396  0.45664647 0.51829052 0.38875796
 0.45428001 0.52979064 0.62545921 0.64782618 0.65265239 0.56976799
 0.64277369 0.33528876 0.45973874 0.45341751 0.52690983 0.66427599
 0.59814577 0.35575622 0.62995929 0.61582329 0.38971679 0.4771326
 0.50889137 0.25105353]


newout_array  #new solution

[0.4214789945860148, 0.5566805507021909, 0.5222593998859411, 0.46066425607167216, 0.4437849132421554, 0.5271937114894408,
 0.424791134573943, 0.4648538659945887, 0.28000829556024387, 0.5717486453029332, 0.5920781058695997, 0.5866547941460012, 
 0.5241485100329547, 0.38158931177460725, 0.5188476113762392, 0.5900746905953183, 0.34495119855714756, 0.5638537286251522, 
 0.3435967359945349, 0.44524997190104454, 0.44175351484451975, 0.42339393886425913, 0.5687501027416468, 0.3140090963728155, 
 0.40985639015924036, 0.4664948621046134, 0.3101396034068746, 0.45664647332866076, 0.5182905157666298, 0.38875796468438406, 
 0.4542800111275337, 0.5297906368971982, 0.6254592119278896, 0.6478261817988752, 0.6526523935382951, 0.569767994485338, 
 0.642773691835847, 0.3352887578683835, 0.45973873832126594, 0.45341751320112617, 0.5269098312525405, 0.6642759923683706, 
 0.5981457683986061, 0.3557562229383897, 0.6299592930489117, 0.6158232897272005, 0.38971678834383916, 0.4771325988592886, 
 0.5088913710936904, 0.25105352820427246]

不同之处在于每种格式的小数位数,如果您只想保留8位小数作为原始格式,out_array您可以尝试map使用具有功能的元素round

newout_array=s.groupby(s.index//N).mean().to_list()
newout_array=list(map(lambda x: round(x,8),newout_array))
于 2020-07-10T01:46:06.010 回答