3

下表显示了在 中生成信号row 2然后在 中生成相反信号的策略示例row 5

     row     open_signal     close_signal      live
      1           0               0             0
      2           1               0             1
      3           0               0             1
      4           0               0             1
      5           0               1             0
      6           0               0             0

我想优化live列的计算。

有没有办法在PandasNumpy中对这个问题进行矢量化以提高性能,生成与下面的 for 循环示例相同的结果?

import pandas as pd
from datetime import datetime

example = {'date': [str(datetime(2017,1,1)), str(datetime(2017,1,2)),str(datetime(2017,1,3)),str(datetime(2017,1,4)),str(datetime(2017,1,5)),str(datetime(2017,1,6)),
                    str(datetime(2017,1,7)), str(datetime(2017,1,8)),str(datetime(2017,1,9)), str(datetime(2017,1,10)),str(datetime(2017,1,11)), str(datetime(2017,1,12)),
                    str(datetime(2017,1,13)),str(datetime(2017,1,14))],
           'open':        [142.11, 142.87, 141.87, 142.11, 142.00, 142.41, 142.50, 142.75, 140.87, 141.25, 141.10, 141.15, 142.55, 142.75],
           'close':       [142.87, 141.87, 142.11, 142.00, 142.41, 142.50, 142.75, 140.87, 141.25, 141.10, 141.15, 142.55, 142.75, 142.11],
           'open_signal': [False,  False,  False,  False,  False,  True,  False,  False,  False,  False,  False,  False,  False,  False],
           'close_signal':[False,  False,  False,  False,  False,  False,  False,  False,  False,   True,  False,  False,  False,  False]
           }

data = pd.DataFrame(example)

in_trade = False
for i in data.iterrows():
    if i[1].open_signal:
        in_trade = True
    if i[1].close_signal:
        in_trade = False
    data.loc[i[0],'in_trade'] = in_trade
4

2 回答 2

4

简单案例

对于发布的示例中的简单案例,这是 NumPy 的一种矢量化方式 -

ar = np.zeros(len(data), dtype=int)
ar[data.open_signal.values] = 1
ar[data.close_signal.values] = -1
data['out'] = ar.cumsum().astype(bool)

运行时测试 -

使用示例数据集并100000沿行对其进行缩放以进行测试。

In [191]: data = pd.concat([data]*100000,axis=0); data.index = range(len(data))

# @Dark's soln with int output
In [192]: %timeit data['new'] = data['open_signal'].cumsum().ne(data['close_signal'].cumsum()).astype(int)
100 loops, best of 3: 13.4 ms per loop

# @Dark's soln with bool output
In [194]: %timeit data['new'] = data['open_signal'].cumsum().ne(data['close_signal'].cumsum()).astype(bool)
100 loops, best of 3: 10 ms per loop

# Proposed in this post
In [195]: %%timeit
     ...: ar = np.zeros(len(data), dtype=int)
     ...: ar[data.open_signal.values] = 1
     ...: ar[data.close_signal.values] = -1
     ...: data['out'] = ar.cumsum().astype(bool)
100 loops, best of 3: 7.52 ms per loop

通用案例

现在,解决一个通用案例:

1] 在没有先前打开信号之后出现关闭信号。

2] 在下一个关闭信号之前出现多个打开信号。

3] 在下一个打开信号之前出现多个关闭信号。

我们需要更多的步骤。

方法#1:这是一个基于searchsorted-

s0 = np.flatnonzero(data.open_signal.values)
s1 = np.flatnonzero(data.close_signal.values)

idx0 = np.searchsorted(s1,s0,'right')
s0c = s0[np.r_[True,idx0[1:] > idx0[:-1]]]

idx1 = np.searchsorted(s0c,s1,'right')
s1c = s1[np.r_[True,idx1[1:] > idx1[:-1]]]

ar = np.zeros(len(data), dtype=int)
ar[s0c] = 1
ar[s1c] = -1
if s1c[0] < s0c[0]:
    ar[s1c[0]] = 0
data['out'] = ar.cumsum().astype(bool)

样本输出 -

In [360]: data
Out[360]: 
    close_signal  open_signal    out
0          False        False  False
1          False        False  False
2           True        False  False
3          False        False  False
4          False        False  False
5          False         True   True
6          False        False   True
7          False         True   True
8          False        False   True
9           True        False  False
10         False        False  False
11          True        False  False
12         False        False  False
13         False        False  False

方法#2:可能更快的一种,因为我们会避免使用searchsorted,而是利用masking-

mix_arr = data.open_signal.values.astype(int) - data.close_signal.values
ar = np.zeros(len(data), dtype=int)
mix_mask = mix_arr!=0
mix_val = mix_arr[mix_mask]
    
valid_mask = np.r_[True, mix_val[1:] != mix_val[:-1]]
ar[mix_mask] = mix_arr[mix_mask]*valid_mask
if mix_val[0] == -1:
    ar[mix_mask.argmax()] = 0    

data['out'] = ar.cumsum().astype(bool)
于 2017-12-09T06:14:58.477 回答
2

我们可以比较累积和,即

data['new'] = data['open_signal'].cumsum().ne(data['close_signal'].cumsum()).astype(int)

  row  open_signal  close_signal  live  new
0    1            0             0     0    0
1    2            1             0     1    1
2    3            0             0     1    1
3    4            0             0     1    1
4    5            0             1     0    0
5    6            0             0     0    0
于 2017-12-09T06:21:31.830 回答