0

我有这个读取 json 文件的 python 脚本,挑选出“销量”最多的汽车(这部分已经完成),但现在我需要弄清楚如何在我们的 json 文件示例中找到销量最高的年份(car_year) 2002 年售出 296 辆汽车,不像 2007 年售出 264 辆汽车,我想出了如何汇总 json 文件的所有“car_sales”,但我需要找到销售额最高的年份

Python脚本:

#!/usr/bin/env python
import json
data = json.load(open('/home/ahmed/events.json'))
#finding the item with the highest sale 
event=max(data, key=lambda ev: ev['total_sales'])
print (event)
#sum the "car_sales" of all the items in the json file
count = sum(map(lambda x: int(x['total_sales']),data))
print (count)

这是 json 文件(一个测试)

[
     {
        "id": 47,
        "car": {
                "car_make": "Lamborghini",
                "car_model": "Murciélago",
                "car_year": 2002
        },
        "price": "$13724.05",
        "total_sales": 149
},
{
        "id": 48,
        "car": {
                "car_make": "volvo",
                "car_model": "x20",
                "car_year": 2010
        },
        "price": "$13724.05",
        "total_sales": 10
},
{
        "id": 49,
        "car": {
                "car_make": "kia",
                "car_model": "kia1.2",
                "car_year": 2007
        },
        "price": "$13724.05",
        "total_sales": 114
},
{
        "id": 50,
        "car": {
                "car_make": "renault",
                "car_model": "p300",
                "car_year": 2002
        },
        "price": "$13724.05",
        "total_sales": 147
},
{
        "id": 51,
        "car": {
                "car_make": "ferrari",
                "car_model": "red",
                "car_year": 2007
        },
        "price": "$13724.05",
        "total_sales": 150
}
        ]
4

1 回答 1

1
  • 使用pandas
  • 细分df['car.car_year'][df.total_sales == df.total_sales.max()]
    • df['car.car_year']选择返回的所需列
      • 用于df所有列
    • [df.total_sales == df.total_sales.max()]创建所有行的布尔值,其中total_salestotal_sales.max()
  • 用于pandas.DataFrame.groupby按特定列分组并聚合不同的计算,例如.sum.max
import pandas as pd
import json

# read the file
data = json.load(open('/home/ahmed/events.json'))

# load into pandas
df = pd.json_normalize(data)

# display(df)
   id      price  total_sales car.car_make car.car_model  car.car_year
0  47  $13724.05          149  Lamborghini    Murciélago          2002
1  48  $13724.05           10        volvo           x20          2010
2  49  $13724.05          114          kia        kia1.2          2007
3  50  $13724.05          147      renault          p300          2002
4  51  $13724.05          150      ferrari           red          2007

# sum of total_sales
df.total_sales.sum()

[out]: 
570

# year of max total_sales
df['car.car_year'][df.total_sales == df.total_sales.max()]

[out]:
4    2007
Name: car.car_year, dtype: int64

# find the total sales per year
dfg = df.groupby('car.car_year', as_index=False).agg({'total_sales': sum})

# display(dfg)
   car.car_year  total_sales
0          2002          296
1          2007          264
2          2010           10

# get the year of max sales
df.groupby('car.car_year', as_index=False)['total_sales'].sum().max()

[out]:
car.car_year    2010
total_sales      296
dtype: int64
于 2020-08-17T23:18:04.223 回答