2

示例 df:

     company   vehicle registration
0   company1     truck       abc123
1   company1     truck      abcdefg
2   company1       car       234cse
3   company1  forklift          NaN
4   company1     truck        93ds2
5   company2       car      rentall
6   company2       car      rental2
7   company2     truck      rentals
8   company2     truck      rental*
9   company2       car      rental5
10  company3     truck       fdsa23
11  company3     truck        asdf4
12  company3     other       fdsag3
13  company3     other          NaN
14  company3     truck      gls319d

样本数据

我的目标是按公司和车辆类型进行计数(将删除注册和车辆列)。

我试过这个:

import pandas as pd

df = pd.read_csv('path to csv', header=0)

df.loc[df.vehicle == 'truck', 'trucks'] = 1
df.loc[df.vehicle == 'car', 'cars'] = 1
df.loc[df.vehicle != 'truck', 'others'] = 1
df.loc[df.vehicle != 'cars', 'others'] = 1

从那里我假设某种 groupby 和 sum 函数将合并行和列。

不幸的是,这只会用“1”值填充车辆列,而不是在相应列中填充值。

我想要的输出是:


company   trucks  cars  others
company1  3       1     1 
company2  2       3     0
company3  3       0     2

我敢肯定这可能以前已经回答过了,但是今天早上我的 google-fu 很弱。

干杯。

4

1 回答 1

5

首先Series.map由字典中的过滤类别使用,并将所有不匹配的值 (NaN) 替换为Series.fillna.

然后传递给crosstab,如果输出列的顺序很重要,请添加DataFrame.reindex

df['new'] = df.vehicle.map({'truck':'trucks', 'car':'cars'}).fillna('other')
df = pd.crosstab(df['company'], df['new']).reindex(['cars','trucks','other'], axis=1)
print (df)
vehicle   cars  trucks  other
company                      
company1     1       3      1
company2     3       2      0
company3     0       3      2
于 2020-04-06T10:14:08.900 回答