1

我在 Docker 容器中运行一些 Python statsmodel 代码。当我在两台不同的计算机上运行此代码时(使用从 DockerHub 中提取的同一个 Docker 容器,而不是在本地构建 2x),我得到了不同的结果。差异很小 - 第 10 位或第 15 位发生变化。但它正在破坏我们可重现的构建。这是 Python statsmodel 问题吗?一个 Docker 问题?

我认为这是 Python,因为在从这些 Docker 映像生成的容器中运行着 1000 多条其他行,并且它们是可重现的。

这是一个 MWE,以及差异示例:

import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
np.random.seed(42)

df = pd.DataFrame(columns=['foo', 'bar'], data=np.random.random((1000, 2)))

y = (df['bar'])
X = np.log10(df['foo'])
X = sm.add_constant(X)
model = sm.OLS(y, X)
fits = model.fit()
predictions = fits.predict(X)

XX = np.linspace(X['foo'].min(), X['foo'].max(), 50)
XX = sm.add_constant(XX)
yy = fits.predict(XX)
sdev, lower, upper = wls_prediction_std(fits, exog=XX, alpha=0.05)

bad = df.loc[df['bar'] < 50,'bar']

df.loc[df['bar'] < 50,'bar'] = fits.predict(sm.add_constant(np.log10(bad)))

fits.summary()

with open("output.txt", "w") as text_file:
    text_file.write(fits.summary().as_csv())

df.to_csv('out.csv', index=False)

并且差异out.csv很小。例如,

$ sdiff <(cat out.csv) <(ssh remote_server cat out.csv) | tail

显示以下内容。请注意,只有最后一位数字发生了变化。

0.18610141784627732,0.5081884090422659                        | 0.18610141784627732,0.5081884090422658
0.45818688673789265,0.5082792408801786                        | 0.45818688673789265,0.5082792408801785
0.13347997241594378,0.5085994020210153                        | 0.13347997241594378,0.5085994020210152
0.7279393069737652,0.5082743139146337                         | 0.7279393069737652,0.5082743139146336
0.43685070261517955,0.5082054932289445                        | 0.43685070261517955,0.5082054932289444
0.7655128989911097,0.5084780190581778                         | 0.7655128989911097,0.5084780190581777
0.6102251494776413,0.5085067071667805                         | 0.6102251494776413,0.5085067071667804
0.7513750860290457,0.5082242252400639                           0.7513750860290457,0.5082242252400639
0.956614621083458,0.5086273010565618                            0.956614621083458,0.5086273010565618
0.05705472115125432,0.5083753342014574                        | 0.05705472115125432,0.5083753342014573
4

0 回答 0