0

我有一个名为 r 的大型真实一维数据集。我想要情节:

mean(log(1+a*r)) vs a, with a > -1 . 

这是我的代码:

   rr=pd.read_csv('goog.csv')
   dd=rr['Close']
   series=pd.Series(dd)
   seriespct=series.pct_change()
   seriespct[0]=seriespct.mean()

   dum1 =[0]*len(dd)

   a=1.
   a_max = 1.
   a_step = 0.01

   a = scipy.arange(-3.+a_step, a_max, a_step)
   n = len(a)
   dum2 =[0]*n
   m=len(dd)

   for j in range(n):
      for i in range(m):
         dum1[i]=math.log(1+a[j]*seriespct[i])

   dum2[j]=scipy.mean(dum1)


   plt.plot(a,dum2)
   plt.show()

我怎样才能以更优雅的方式做到这一点?

4

2 回答 2

3

我会推荐这个:

plt.plot(a, np.log(1 + r*a[:,None]).mean(1))

这具有很大的速度优势,因为它避免了 for 循环,并且在您的数据集很大的情况下,在 numpy 中完成的循环明显更快。

In [49]: a = np.arange(a_step-.3, a_max, a_step)

In [50]: r = np.random.random(100)

In [51]: timeit [scipy.mean(log(1+a[i]*r)) for i in range(len(a))]
100 loops, best of 3: 5.47 ms per loop

In [52]: timeit np.log(1 + r*a[:,None]).mean(1)
1000 loops, best of 3: 384 µs per loop

它通过广播工作,因此a沿一个轴和r另一个轴变化,然后您可以沿沿变化的轴取平均值r,因此您仍然有一个随 变化的数组a(并且具有与 相同的形状a):

import numpy as np
import matplotlib.pyplot as plt

r = np.random.random(100)

a = 1.
a_max = 1.
a_step = 0.01
a = np.arange(a_step-.3, a_max, a_step)
a.shape
#(129,)
a = a[:,None] #adds a new axis, making this a column vector, same as: a = a.reshape(-1,1)
a.shape
#(129, 1)
(a*r).shape
#(129, 100)
loga = np.log(1 + a*r)
loga.shape
#(129,100)
mloga = loga.mean(axis=1) #take the mean along the 2nd axis where `a` varies
mloga.shape
#(129,)

plt.plot(a, mloga)
plt.show()

附录:

为避免对广播的依赖,您可以使用np.outer

plt.plot(a, np.log(1 + np.outer(a,r)).mean(1))

无需重塑a(跳过步骤a = a[:,None]

这是一个更简单的示例,因此您可以看到发生了什么:

r = np.exp(np.arange(1,5))
a = np.arange(5)

In [33]: r
Out[33]: array([  2.71828183,   7.3890561 ,  20.08553692,  54.59815003])

In [34]: a
Out[34]: array([0, 1, 2, 3, 4])

In [39]: r*a[:,None]
Out[39]: 
# this is  2.7...         7.3...        20.08...       54.5...         # times:
array([[   0.        ,    0.        ,    0.        ,    0.        ],   # 0
       [   2.71828183,    7.3890561 ,   20.08553692,   54.59815003],   # 1
       [   5.43656366,   14.7781122 ,   40.17107385,  109.19630007],   # 2
       [   8.15484549,   22.1671683 ,   60.25661077,  163.7944501 ],   # 3
       [  10.87312731,   29.5562244 ,   80.34214769,  218.39260013]])  # 4

In [40]: np.outer(a,r)
Out[40]: 
array([[   0.        ,    0.        ,    0.        ,    0.        ],
       [   2.71828183,    7.3890561 ,   20.08553692,   54.59815003],
       [   5.43656366,   14.7781122 ,   40.17107385,  109.19630007],
       [   8.15484549,   22.1671683 ,   60.25661077,  163.7944501 ],
       [  10.87312731,   29.5562244 ,   80.34214769,  218.39260013]])

# this is the mean of each column:
In [41]: (np.outer(a,r)).mean(1)
Out[41]: array([  0.        ,  21.19775622,  42.39551244,  63.59326866,  84.79102488])

# and the log of 1 + the above is:
In [42]: np.log(1+(np.outer(a,r)).mean(1))
Out[42]: array([ 0.        ,  3.09999121,  3.77035604,  4.16811021,  4.4519144 ])
于 2013-10-02T15:47:17.840 回答
1

你可以使用 scipy 来做手段。

您可以使用 matplotlib 进行绘图。

import scipy
from matplotlib import pyplot

#convert r from a python list to an 1-D array
r = scipy.array(r)

#edit these
a_max = 100
a_step = 0.1

a = scipy.arange(-1+a_step, a_max, a_step)
n = len(a)

pyplot.plot(a, [scipy.mean(log(1+a[i]*r)) for i in range(n)], 'b-')
pyplot.show()
于 2013-10-02T14:22:09.853 回答