python - seaborn 的标准偏差误差线似乎太小了

Question

我最初在我的数据帧上使用 numpy 函数 .std 来获取标准偏差并使用 matplotlib 绘制它。后来，我尝试使用 seaborn 制作相同的图表。这两个图看起来足够接近，直到我将它们叠加起来，发现 seaborn 的所有误差线都更小——它们越大，差异越明显。我在不同的软件中检查了 .std 的结果是否正确，并且它们也被正确绘制。问题的根源是什么（我似乎无法从 seaborn 中提取图形源数据）？

我使用了这段代码： ax_sns = sns.barplot(x = 'name', y = column_to_plot, data=data, hue='method', capsize=0.1, ci='sd', errwidth=0.9)

图表 - seaborn 误差条更小 - 更暗

score 1 · Accepted Answer

您没有提供计算标准偏差的代码。也许你用过 pandas .std()。Seaborn 使用 numpy 的。Numpystd使用“贝塞尔校正”。/ n当数据点的数量较少时（当vs/ (n-1)较大时），差异最为明显。

以下代码可视化了通过 seaborn、numpy 和 pandas 计算的误差线之间的差异。

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

flights = sns.load_dataset('flights')
fig, ax = plt.subplots(figsize=(12, 5))
sns.barplot(x='month', y='passengers', data=flights, capsize=0.1, ci='sd', errwidth=0.9, fc='yellow', ec='blue', ax=ax)

flights['month'] = flights['month'].cat.codes  # change to a numeric format
for month, data in flights.groupby('month'):
    mean = data['passengers'].mean()
    pandas_std = data['passengers'].std()
    numpy_std = np.std(data['passengers'])
    ax.errorbar(month - 0.2, mean, yerr=numpy_std, ecolor='crimson', capsize=8,
                label='numpy std()' if month == 0 else None)
    ax.errorbar(month + 0.2, mean, yerr=pandas_std, ecolor='darkgreen', capsize=8,
                label='pandas std()' if month == 0 else None)
ax.margins(x=0.015)
ax.legend()
plt.tight_layout()
plt.show()

PS：一些相关的帖子以及附加信息：

python - seaborn 的标准偏差误差线似乎太小了

1 回答 1

Related

Reference