1

https://jsonplaceholder.typicode.com/todos的数据中,我想按用户计算“已完成”的项目。

目前,我通过首先收集现有的用户 ID 键来解决此问题,然后为数据集中的每个元素检查其是否由当前用户拥有并附加到该用户的项目列表中。

users_items = {}

import json
from urllib import request

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)

def get_user_ids(items):
    for item in items:
        users_items[item['userId']] = None

def get_user_items():
    for uid in users_items:
        items = []
        for item in data:
            if(item['userId'] == uid):
                items.append(item['completed'])
        users_items[uid] = items

done_items_by_user = {}
def count_completed_by_user():
    for user in users_items:
        done_items_by_user[user] = sum(users_items[user])

get_user_ids(data)
get_user_items()

我特别不喜欢双循环和字典值的初始化,其中包含一个空列表get_users_ids

4

3 回答 3

3

只需使用defaultdict对象:

import json
from urllib import request
from collections import defaultdict

# Data from
uri = "https://jsonplaceholder.typicode.com/todos"

response = request.urlopen(uri).read()
data = json.loads(response)


def count_user_completed_items(data):
    result = defaultdict(int)
    for item in data:
        if item['completed']: result[item['userId']] += 1
    return dict(result)


print(count_user_completed_items(data))

输出(其中键是“用户 ID”,值是一些“完成”项目):

{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}
于 2019-06-05T11:39:33.763 回答
0

您可以使用 dict 方法get()插入/更新用户 ID:

done_items_by_user = dict()
for item in data:
    done_items_by_user[item['userId']] = done_items_by_user.get(item['userId'], 0) + item['completed']
于 2019-06-05T11:43:04.710 回答
0

流行的pandas库允许您在一行中做到这一点:

import pandas as pd
complete_items_per_user = pd.DataFrame(data).groupby('userId')['completed'].sum()

如果您问没有 可以做什么pandas,您可以使用 dict 理解来避免显式循环:

users = set(x['userId'] for x in data)
complete_items_per_user = {user: sum(x['completed'] for x in data if x['userId']==user) for user in users}
于 2019-06-05T11:44:31.347 回答