2

The normal matplotlib boxplot command in Python returns a dictionary with keys for the boxes, median, whiskers, fliers, and caps. This makes styling really easy.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Create a dataframe and subset it for a boxplot
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
boxes= [df1[df1['X'] == 'A'].Col1, df1[df1['X'] == 'B'].Col1]

# Call the standard matplotlib boxplot function,
# which returns a dictionary including the parts of the graph
mbp = plt.boxplot(boxes)
print(type(mbp))

# This dictionary output makes styling the boxplot easy
plt.setp(mbp['boxes'], color='blue')
plt.setp(mbp['medians'], color='red')
plt.setp(mbp['whiskers'], color='blue')
plt.setp(mbp['fliers'], color='blue')

The Pandas library has an "optimized" boxplot function for its grouped (hierarchically indexed ) dataframes. Instead of returning several dictionaries for each group, however, it returns an matplotlib.axes.AxesSubplot object. This makes styling very difficult.

# Pandas has a built-in boxplot function that returns
# a matplotlib.axes.AxesSubplot object
pbp = df1.boxplot(by='X')
print(type(pbp))

# Similar attempts at styling obviously return TypeErrors
plt.setp(pbp['boxes'], color='blue')
plt.setp(pbp['medians'], color='red')
plt.setp(pbp['whiskers'], color='blue')
plt.setp(pbp['fliers'], color='blue')

Is this AxisSubplot object produced by the pandas df.boxplot(by='X') function accessible?

4

2 回答 2

8

You could also specify the return_type as dict. This will return the boxplot properties directly in a dictionary, which is indexed by each column that was plotted in the boxplot.

To use the example above (in IPython):

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot( by='X', return_type='dict' )

>>> bp.keys()
['Col1', 'Col2']

>>> bp['Col1'].keys()
['boxes', 'fliers', 'medians', 'means', 'whiskers', 'caps']

Now, changing linewidths is a matter of a list comprehension :

>>> [ [item.set_linewidth( 2 ) for item in bp[key]['medians']] for key in bp.keys() ]
[[None, None], [None, None]]
于 2015-01-23T05:07:42.163 回答
2

恐怕你必须硬编码。举个pandas例子: http: //pandas.pydata.org/pandas-docs/stable/visualization.html#box-plotting

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot(by='X')
cl=bp[0].get_children()
cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]

现在让我们确定哪个是盒子、中位数等:

for i, item in enumerate(cl):
    if item.get_xdata().mean()>0:
        bp[0].text(item.get_xdata().mean(), item.get_ydata().mean(), str(i), va='center', ha='center')

情节是这样的:

在此处输入图像描述

每个栏由 8 个项目组成。例如,第 5 项是中位数。第 7 项和第 8 项可能是传单,我们这里没有。

知道了这些,修改栏的某些部分就很容易了。如果我们想将中位数设置linewidth为 2:

for i in range(_your_number_of_classes_2_in_this_case):
    cl[5+i*8].set_linewidth(2.)
于 2013-10-18T17:30:38.173 回答