26

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.

enter image description here

What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.

I've tried this:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()

but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.

4

3 回答 3

56

这段代码:

data['2013-08-17'].boxplot(by='SPECIES')

不起作用,因为 boxplot 是 DataFrame 而不是 Series 的函数。

而在 Pandas > 0.18.1 中,boxplot 函数的参数columns定义了从哪一列获取数据。

所以

data.boxplot(column='2013-08-17',by='SPECIES')

应该返回所需的结果。

鸢尾花数据集的示例:

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)

创建:

Boxplot 鸢尾花数据集与熊猫

plt.suptitle('') 

关闭烦人的自动字幕。当然,列参数接受列列表......所以

data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)

也有效。

于 2016-11-01T11:22:01.063 回答
15

我想我想通了,也许这会对某人有所帮助:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES').T
grouped.boxplot()

基本上 groupby 输出需要转置,以便箱线图显示正确的分组:

在此处输入图像描述

于 2013-08-28T22:39:29.677 回答
1

这应该适用于 0.16 版:

data['2013-08-17'].boxplot(by='SPECIES')
于 2016-10-17T18:18:57.377 回答