13

有没有一种pythonic方法来建立一个包含某个函数的运行平均值的列表?

在阅读了一篇关于火星人、黑匣子和柯西分布的有趣小文章后,我认为自己计算柯西分布的运行平均值会很有趣:

import math 
import random

def cauchy(location, scale):
    p = 0.0
    while p == 0.0:
        p = random.random()
    return location + scale*math.tan(math.pi*(p - 0.5))

# is this next block of code a good way to populate running_avg?
sum = 0
count = 0
max = 10
running_avg = []
while count < max:
    num = cauchy(3,1)
    sum += num
    count += 1
    running_avg.append(sum/count)

print running_avg     # or do something else with it, besides printing

我认为这种方法有效,但我很好奇是否有running_avg比使用循环和计数器(例如列表推导)更优雅的方法来构建该列表。

有一些相关的问题,但它们解决了更复杂的问题(小窗口大小、指数加权)或不特定于 Python:

4

3 回答 3

15

你可以写一个生成器:

def running_average():
  sum = 0
  count = 0
  while True:
    sum += cauchy(3,1)
    count += 1
    yield sum/count

或者,给定一个柯西数生成器和一个运行求和生成器的实用函数,你可以有一个简洁的生成器表达式:

# Cauchy numbers generator
def cauchy_numbers():
  while True:
    yield cauchy(3,1)

# running sum utility function
def running_sum(iterable):
  sum = 0
  for x in iterable:
    sum += x
    yield sum

# Running averages generator expression (** the neat part **)
running_avgs = (sum/(i+1) for (i,sum) in enumerate(running_sum(cauchy_numbers())))

# goes on forever
for avg in running_avgs:
  print avg

# alternatively, take just the first 10
import itertools
for avg in itertools.islice(running_avgs, 10):
  print avg
于 2009-11-24T14:55:39.463 回答
6

你可以使用协程。它们类似于生成器,但允许您发送值。协程是在 Python 2.5 中添加的,所以这在之前的版本中不起作用。

def running_average():
    sum = 0.0
    count = 0
    value = yield(float('nan'))
    while True:
        sum += value
        count += 1
        value = yield(sum/count)

ravg = running_average()
next(ravg)   # advance the corutine to the first yield

for i in xrange(10):
    avg = ravg.send(cauchy(3,1))
    print 'Running average: %.6f' % (avg,)

作为列表理解:

ravg = running_average()
next(ravg)
ravg_list = [ravg.send(cauchy(3,1)) for i in xrange(10)]

编辑:

  • Using the next() function instead of the it.next() method. This is so it also will work with Python 3. The next() function has also been back-ported to Python 2.6+.
    In Python 2.5, you can either replace the calls with it.next(), or define a next function yourself.
    (Thanks Adam Parkin)
于 2009-11-24T16:23:24.467 回答
4

我在这里为您提供了两种可能的解决方案。两者都是适用于任何数字列表的通用运行平均函数。(可以使用任何可迭代的)

基于生成器:

nums = [cauchy(3,1) for x in xrange(10)]

def running_avg(numbers):
    for count in xrange(1, len(nums)+1):
        yield sum(numbers[:count])/count

print list(running_avg(nums))

基于列表理解(实际上与前面的代码相同):

nums = [cauchy(3,1) for x in xrange(10)]

print [sum(nums[:count])/count for count in xrange(1, len(nums)+1)]

基于发电机兼容的发电机:

编辑:我刚刚测试了这个,看看我是否可以轻松地使我的解决方案与生成器兼容,以及它的性能是什么。这就是我想出的。

def running_avg(numbers):
    sum = 0
    for count, number in enumerate(numbers):
        sum += number
        yield sum/(count+1)

请参阅下面的性能统计数据,非常值得。

性能特点:

编辑:我还决定测试 Orip 对多个生成器的有趣使用,以查看对性能的影响。

使用 timeit 和以下(1,000,000 次迭代 3 次):

print "Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())
print "LC based:", ', '.join(str(x) for x in Timer('[sum(nums[:count])/count for count in xrange(1, len(nums)+1)]', 'from __main__ import nums').repeat())
print "Orip's:", ', '.join(str(x) for x in Timer('list(itertools.islice(running_avgs, 10))', 'from __main__ import itertools, running_avgs').repeat())

print "Generator-compatabile Generator based:", ', '.join(str(x) for x in Timer('list(running_avg(nums))', 'from __main__ import nums, running_avg').repeat())

我得到以下结果:

Generator based: 17.653908968, 17.8027219772, 18.0342400074
LC based: 14.3925321102, 14.4613749981, 14.4277560711
Orip's: 30.8035550117, 30.3142540455, 30.5146529675

Generator-compatabile Generator based: 3.55352187157, 3.54164409637, 3.59098005295

代码见注释:

Orip's genEx based: 4.31488609314, 4.29926609993, 4.30518198013 

结果以秒为单位,并显示LC新的发生器兼容发生器方法始终更快,但您的结果可能会有所不同。我预计我的原始生成器和新生成器之间的巨大差异在于总和不是即时计算的。

于 2009-11-24T15:26:48.510 回答