1

我在 3d 字典中有数据:

 movieid, date,customer_id,views
 0, (2011,12,22), 0, 22
 0, (2011,12,22), 1, 2
 0, (2011,12,22), 2, 12
 .....
 0, (2011,12,22), 7, 2
 0, (2011,12,23), 0, 123

..所以基本上数据代表了每个客户每天观看一部电影的次数..每个客户(只有 8 个客户)..

现在,我想计算……平均每位顾客观看一部电影的次数。

所以基本上

    movie_id,customer_id, avg_views
     0, 0, 33.2
     0, 1 , 22.3

  and so on

解决这个问题的pythonic方法是什么。

塔肯

编辑:

 data = defaultdict(lambda : defaultdict(dict))
 date = datetime.datetime(2011,1,22)
 data[0][date][0] = 22
 print data
defaultdict(<function <lambda> at 0x00000000022F7CF8>, 
 {0: defaultdict(<type 'dict'>, 
 {datetime.datetime(2011, 1, 22, 0, 0): {0: 22}}))

假设只有 2 位客户、1 部电影和 2 天的数据

 movie_id, date, customer_id,views
 0 , 2011,1,22,0,22
 0 , 2011,1,22,1,23
 0 , 2011,1,23,0,44

注意:客户 1 没有在 1 月 23 日观看 id 0 的电影现在答案是

 movie_id,customer_id,avg_views
  0   , 0 ,    (22+44)/2
  0,    1,      (23)/1
4

3 回答 3

1

我的愿景是:

pool = [
    (0, (2011,12,22), 0, 22),
    (0, (2011,12,22), 1, 2),
    (0, (2011,12,22), 2, 12),
    (0, (2011,12,22), 7, 2),
    (0, (2011,12,23), 0, 123),
]


def calc(memo, row):
    if (row[2] in memo.keys()):
        num, value = memo[2]
    else:
        num, value = 0, 0

    memo[row[2]] = (num + 1, value + row[3])
    return memo

# dic with sum and number
v = reduce(calc, pool, {})
# calc average
avg = map(lambda x: (x[0], x[1][1] / x[1][0]), v.items())

print dict(avg)

其中avg- 是 key = customer_id 的字典,而 value - 视图的平均值

于 2012-11-26T16:24:05.653 回答
1

sum让这很容易。在我的原始版本中,我使用dict.keys()了很多,但在字典上迭代会默认为您提供键。

此函数计算单行结果:

def average_daily_views(movie_id, customer_id, data):
    daily_values = [data[movie_id][date][customer_id] for date in data[movie_id]]
    return sum(daily_values)/len(daily_values)

然后你可以循环它以获得你想要的任何形式。也许:

def get_averages(data):
    result = [average_daily_views(movie, customer, data) for customer in 
              data[movie] for movie in data]
于 2012-11-26T16:17:18.923 回答
1

我认为你应该稍微重组你的数据,以更好地服务于你的目的:

restructured_data = collections.defaultdict(lambda: collections.deafualtdict(collections.defaultdict(int)))
for movie in data:
    for date in data[movie]:
        for customer,count in date.iteritems():
            restructured_data[customer_id][movie_id][date] += count

averages = collections.defaultdict(dict)
for customer in restructured_data:
    for movie in restructured_data[customer]:
        avg = sum(restructured_data[customer][movie].itervalues())/float(len(restructured_data[customer][movie]))
        averages[movie][customer] = avg

for movie in averages:
    for customer, avg in averages[movie].iteritems():
        print "%d, %d, %f" %(movie, customer, avg)

希望这可以帮助

于 2012-11-26T16:40:27.680 回答