我正在尝试绘制几个子图来分析每个日期、每个座席的平均通话持续时间。我从 SQL 表中读取该信息并加载到 Panda Dataframe 中。并非所有代理共享相同的天数,甚至相同的日期,因此共享 x=True 没有意义。
我想出了这个:
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()
#Figure out number of rows needed for 2 column grid plot
#Also accounts for odd number of plots
group_len = len(df2.groupby('agent_id'))
#nrows = int(math.ceil(group_len/2.))
#Setup Subplots
fig, axs = plt.subplots(group_len,1,sharex=False, sharey=True)
for i,var in enumerate(df2.groupby('agent_id')):
agent_id = var[0]
#print(df2[agent_id])
df2[agent_id].plot(x ='call_date', y='duration_minutes',
kind = 'line',legend=False, ax=axs[i],marker='*')
axs[i].tick_params(axis='both', which='both', labelsize=7)
axs[i].legend(['Agent Id: ' + str(agent_id)])
#axs[i].set_title('Agent Id: ' + str(i),fontsize=8)
#axs[i].yaxis.set_ticks_position('none')
axs[i].set_xlabel('Day')
#axs[i].set_ylabel('Agent Id: ' + str(i),fontsize=8)
#plt.xticks(rotation=90)
plt.suptitle('Avg call duration per day, per agent', verticalalignment='bottom', fontsize=12)
plt.tight_layout()
#df1.plot(x ='agent_id', y='duration_minutes', kind = 'bar', title='Avg Call duration per agent')
plt.show()
df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()
#Figure out number of rows needed for 2 column grid plot
#Also accounts for odd number of plots
group_len = len(df2.groupby('agent_id'))
#nrows = int(math.ceil(group_len/2.))
#Setup Subplots
fig, axs = plt.subplots(group_len,1,sharex=False, sharey=True)
for i,var in enumerate(df2.groupby('agent_id')):
agent_id = var[0]
#print(df2[agent_id])
df2[agent_id].plot(x ='call_date', y='duration_minutes',
kind = 'line',legend=False, ax=axs[i],marker='*')
axs[i].tick_params(axis='both', which='both', labelsize=7)
axs[i].legend(['Agent Id: ' + str(agent_id)])
#axs[i].set_title('Agent Id: ' + str(i),fontsize=8)
#axs[i].yaxis.set_ticks_position('none')
axs[i].set_xlabel('Day')
#axs[i].set_ylabel('Agent Id: ' + str(i),fontsize=8)
#plt.xticks(rotation=90)
plt.suptitle('Avg call duration per day, per agent', verticalalignment='bottom', fontsize=12)
plt.tight_layout()
#df1.plot(x ='agent_id', y='duration_minutes', kind = 'bar', title='Avg Call duration per agent')
plt.show()
我想改进这个输出,但我尝试了很多东西,有时没有运气。我希望能够使用 Panda 的数据框,所以我将研究范围缩小到Cufflinks,我现在正在使用它。我想出了这个解决方案,但如果可能的话,我希望每张图都有 legend=agent_id 和一种颜色。
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
#import seaborn as sns
#Cufflinks is a 3rd wrapper library around Plotly, inspired by the Pandas .plot() API.
import cufflinks as cf
from plotly.offline import iplot
df = pd.DataFrame(SQL_Query,columns=['id','agent_id','duration_minutes','call_date','inbound'])
# 2) Get the avg of duration per agent, per day
df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()
fig_array = []
for i,var in df2.groupby('agent_id'):
#print(var)
agent_id= var[0]
#print('--------------------------------------------------')
fig = var.reset_index().iplot(theme='pearl',asFigure=True
,x ='call_date', y='duration_minutes',
kind = 'line',
xTitle='', yTitle='Duration (min)',
title=str(agent_id),
world_readable=True)
fig.update_layout(showlegend=False)
fig.update_traces(texttemplate='%{y:.2f}',
hovertemplate='<b>Day: </b>%{x} <br><b>Avg duration(min): </b>%{y}')
fig_array.append(fig)
fig = cf.subplots(fig_array,shape=(group_len,1))
#iplot(fig)
plot(fig, filename='avg_duration_per_day_per_agent.html')
CSV 文件(我从 sql 表中读取,但它是相同的)是这样的:id,agentid,duration,date,inbound
1,3,10.52,2019/05/01,true
2,1,12.93,2019/04/06,false
3,2,10.32,2019/06/14,true
4,3,8.84,2019/06/13,false
5,3,13.43,2019/05/06,false
6,3,4.78,2019/05/04,false
7,1,9.21,2019/06/21,true
8,5,9,2019/05/26,true
9,5,12.49,2019/06/04,true
10,3,3.68,2019/05/05,false
11,2,6.06,2019/06/22,false
12,4,7.66,2019/06/20,false
13,2,6.17,2019/06/15,true
14,4,13.6,2019/06/26,true
...
我想展示一个更直观/漂亮的图表,但我坚持自定义这些多图表,因为我无法隐藏子图的图例、标题等。不过,我只用一张图就完美地做到了。如何放置 legend= agent_id 和一个标题,以及每个 x 轴和 y 轴的标题?不管用。