我有一个数据框
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
df= {
'Gen':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'],
'Site':['FRX','FX','FRX','FRX','FRX','FX','FRX','FX','FX','FX','FX','FRX','FRX','FRX','FRX','FRX'],
'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'],
'UID':[1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004],
'color':['R','R','G','G','B','G','B','B','R','G','R','G','B','B','R','G'],
'Time2':[150.78,162.34,188.53,197.69,208.07,217.76,229.48,139.51,146.87,182.54,189.57,199.97,229.28,244.73,269.91,249.19],
'Time3':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12],
'Time4':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33],
'Time5':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23]
}
df = pd.DataFrame(df,columns = ['Gen','Site','Type','UID','color','Time2','Time3','Time4','Time5'])
df.info()
我想编写一个接受 adataframe
并执行以下操作的函数:
countplots
对于具有object
dtype 的列(、GEN
和columns的 4 个计数图)Site
Type
color
boxplot
对于具有float
dtype 的列(4 个箱线图,用于Time2
,....,Time5
列)将图表导出为 pdf 文件 - 每页两个图表
我的尝试:
# I am open to other approaches
def data_explorer(data):
for col in data.columns:
# 1. countplots for columns with the object dtype
if data[col].dtype == 'object':
sns.countplot(x = col, data = data)
# 2. boxplots for columns with the float dtype
elif data[col].dtype == 'float':
sns.boxplot(data[col])
else:
print("skip integer dtype")
# 3. save the graphs as pdf- 4 graphs per page
plt.savefig('data_exploration.pdf')
请注意:最终输出应该有总共 8 个图表