1

我正在使用熊猫创建数据透视表。我的数据看起来通常包含许多可以很容易地与 np.mean 聚合的数值(例如question1),但有一个例外 - 净推荐值(请注意欧盟和北美的总计 0.00 )

    responseId  country region  nps question1
0   1           Germany EU      11  3.2
1   2           Germany EU      10  5.0
2   3           US      NA      7   4.3
3   4           US      NA      5   4.8
4   5           France  EU      5   3.2
5   6           France  EU      5   5.0
6   7           France  EU      11  5.0
region                           EU               NA
country    France   Germany   Total   US       Total
nps        -33.33   100.0     0.00    -100.00   0.00
question1  4.40     4.1       4.25    4.55      4.55

对于 NPS,我使用自定义 aggfunc

def calculate_nps(column):    
    detractors = [1,2,3,4,5,6,7]
    passives = [8,9]
    promoters = [10,11]
    
    counts = column.value_counts(normalize=True)
    percent_promoters = counts.reindex(promoters).sum()
    percent_detractors = counts.reindex(detractors).sum()
    
    return (percent_promoters - percent_detractors) * 100

aggfunc = {
    "nps": calculate_nps,
    "question1": np.mean
}

pd.pivot_table(data=df,columns=["region","country"],values=["nps","question1"],aggfunc=aggfunc,margins=True,margins_name="Total",sort=True)

这个 aggfunc 对常规列工作正常,但对边距(“Total”列)失败,因为 pandas 传递已经聚合的数据。对于常规字段,calculate_nps接收这样的列

4     5
5     5
6    11
Name: nps, dtype: int64

但是对于边距,数据看起来像这样

region  country
EU      France     -33.333333
        Germany    100.000000
Name: nps, dtype: float64

calculate_nps无法处理此类数据并返回 0。在这种情况下,应该应用 column.mean() 我这样解决了(注意如果 column.index.names != [None]

def calculate_nps(column):
    if column.index.names != [None]:
        return column.mean()
    
    detractors = [1,2,3,4,5,6,7]
    passives = [8,9]
    promoters = [10,11]
    
    counts = column.value_counts(normalize=True)
    percent_promoters = counts.reindex(promoters).sum()
    percent_detractors = counts.reindex(detractors).sum()
    
    return (percent_promoters - percent_detractors) * 100

现在数据透视表是正确的

region                           EU                  NA
country    France   Germany   Total   US          Total
nps        -33.33   100.0     33.33   -100.00   -100.00
question1  4.40     4.1        4.25   4.55         4.55

问题

是否有适当/更好的方法来确定传递给 aggfunc 的数据类型?我不确定我的解决方案是否适用于所有场景

4

0 回答 0