5

I could get the size info using groupby and add text to the corresponding location. But I can't help thinking there's a better way as this really seems mundane, something many people would like to see...

To illustrate, the following code would generate a grouped boxplot

import pandas as pd
df = pd.DataFrame(rand(100, 1), columns=['value'])
df.ix[:23, 'class']='A'
df.ix[24:, 'class']='B'
df.boxplot(column='value', by='class')

boxplot What I'd like is to show the sample size of each class A and B, namely 24 and 76 respectively. It could appear as legend or somewhere near the boxes, either is ok with me.

Thanks!

4

1 回答 1

8

n在类刻度标签中。我把它当作一个传奇来尝试,但我认为它并不那么清楚。R 有更多的箱线图选项,包括使箱线的宽度与样本大小成正比;在 matplotlib 中不是默认设置,但很简单,而且看起来非常易读:

import pandas as pd
from numpy.random import rand, randint

df = pd.DataFrame(rand(100, 1), columns=['value'])

cut1 = randint(2,47)
cut2 = randint(52, 97)
df.ix[:cut1, 'class']='A'
df.ix[cut1+1:cut2, 'class']='B'
df.ix[cut2+1:, 'class'] = 'C'

dfg = df.groupby('class')

counts = [len(v) for k, v in dfg]
total = float(sum(counts))
cases = len(counts)

widths = [c/total for c in counts]  

cax = df.boxplot(column='value', by='class', widths=widths)
cax.set_xticklabels(['%s\n$n$=%d'%(k, len(v)) for k, v in dfg])

在此处输入图像描述

于 2015-03-30T18:42:43.463 回答