3

我有一个字典列表,键为“a”、“n”、“o”、“u”。有没有办法加快这个计算,例如NumPy?列表中有数以万计的项目。

数据是从数据库中提取的,所以我必须忍受它最初是字典列表的形式。

x = n = o = u = 0
for entry in indata:
    x += (entry['a']) * entry['n']  # n - number of data points
    n += entry['n']
    o += entry['o']
    u += entry['u']

    loops += 1

average = int(round(x / n)), n, o, u
4

3 回答 3

3

我怀疑这会快得多,但我想它是timeit...的候选人

from operator import itemgetter
x = n = o = u = 0
items = itemgetter('a','n','o','u')
for entry in indata:
    A,N,O,U = items(entry)
    x += A*N  # n - number of data points
    n += N
    o += O    #don't know what you're doing with O or U, but I'll leave them
    u += U

average = int(round(x / n)), n, o, u

至少,它保存了一个查找,entry['n']因为我现在已经将它保存到一个变量中

于 2012-10-22T15:51:00.567 回答
2

你可以尝试这样的事情:


mean_a = np.sum(np.array([d['a'] for d in data]) * np.array([d['n'] for d in data])) / len(data)

编辑:实际上,@mgilson 的上述方法更快:


import numpy as np
from operator import itemgetter
from pandas import *

data=[] for i in range(100000): data.append({'a':np.random.random(), 'n':np.random.random(), 'o':np.random.random(), 'u':np.random.random()})

def func1(data): x = n = o = u = 0 items = itemgetter('a','n','o','u') for entry in data: A,N,O,U = items(entry) x += A*N # n - number of data points n += N o += O #don't know what you're doing with O or U, but I'll leave them u += U

    average = int(round(x / n)), n, o, u
    return average

def func2(data): mean_a = np.sum(np.array([d['a'] for d in data]) * np.array([d['n'] for d in data])/len(data) return (mean_a, np.sum([d['n'] for d in data]), np.sum([d['o'] for d in data]), np.sum([d['u'] for d in data]) )

def func3(data): dframe = DataFrame(data) return np.sum((dframe["a"]*dframe["n"])) / dframe.shape[0], np.sum(dframe["n"]), np.sum(dframe["o"]), np.sum(dframe["u"])

In [3]: %timeit func1(data) 10 loops, best of 3: 59.6 ms per loop

In [4]: %timeit func2(data) 10 loops, best of 3: 138 ms per loop

In [5]: %timeit func3(data) 10 loops, best of 3: 129 ms per loop

如果您正在对数据进行其他操作,我肯定会考虑使用 Pandas 包。它的 DataFrame 对象与您正在使用的字典列表非常匹配。我认为大部分开销是将数据放入 numpy 数组或 DataFrame 对象的 IO 操作。

于 2012-10-22T17:04:08.503 回答
0

如果您要做的只是获得某物的平均值,那为什么不呢?

sum_for_average = math.fsum(your_item)
average_of_list = sum_for_average / len(your_item)

一点也不在乎 numpy。

于 2012-10-22T15:29:54.063 回答