我知道矢量化函数是编写代码以提高速度的首选方法,但我无法找到一种方法来完成这个函数在没有循环的情况下所做的事情。我编写此函数的方式导致完成时间极慢。(传递两个具有 100 列和 2000 行的数据帧作为参数,这个函数需要 100 秒才能完成。我希望更多像 1 秒。)
def gen_fuzz_logic_signal(longp, shortp):
# Input dataframes should have 0, -1, or 1 value
flogic_signal = pd.DataFrame(index = longp.index, columns = longp.columns)
for sym in longp.columns:
print sym
prev_enter = 0
for inum in range(0, len(longp.index)):
cur_val = np.nan
if longp.ix[inum, sym] == 0 and prev_enter == +1:
cur_val = 0.5
if shortp.ix[inum, sym] == 0 and prev_enter == -1:
cur_val = -0.5
if longp.ix[inum, sym] == 1 and shortp.ix[inum, sym] == -1:
if longp.ix[inum - 1, sym] != 1:
cur_val = 1
prev_enter = 1
elif shortp.ix[inum - 1, sym] != -1:
cur_val = -1
prev_enter = -1
else:
cur_val = prev_enter
else:
if longp.ix[inum, sym] == 1:
cur_val = 1
prev_enter = 1
if shortp.ix[inum, sym] == -1:
cur_val = -1
prev_enter = -1
flogic_signal.ix[inum, sym] = cur_val
return flogic_signal
该函数的输入只是两个数据帧,其值为 1、-1 或 0。如果有人对如何向量化或加速它有想法,我将不胜感激。我尝试用“[sym][inum]”替换“.ix[inum, sym]”,但这更慢。
GOOG longp GOOG shortp GOOG func result
2011-07-28 0 -1 -1
2011-07-29 0 -1 -1
2011-08-01 0 -1 -1
2011-08-02 0 -1 -1
2011-08-03 0 -1 -1
2011-08-04 0 -1 -1
2011-08-05 0 -1 -1
2011-08-08 0 0 -0.5
2011-08-09 0 0 -0.5
2011-08-10 0 0 -0.5
2011-08-11 0 0 -0.5
2011-08-12 1 0 1
2011-08-15 1 0 1
2011-08-16 1 0 1
2011-08-17 1 0 1
2011-08-18 1 0 1
2011-08-19 1 0 1
2011-08-22 1 0 1
2011-08-23 1 0 1
2011-08-24 1 0 1
2011-08-25 1 0 1
2011-08-26 1 0 1
2011-08-29 1 0 1
2011-08-30 1 0 1
2011-08-31 1 0 1
2011-09-01 1 0 1
2011-09-02 1 0 1
2011-09-06 1 0 1
2011-09-07 1 0 1
2011-09-08 1 0 1
2011-09-09 1 0 1
2011-09-12 1 0 1
2011-09-13 1 0 1
2011-09-14 1 0 1
2011-09-15 1 0 1
2011-09-16 1 0 1
2011-09-19 1 0 1
2011-09-20 1 0 1
2011-09-21 1 0 1
2011-09-22 1 0 1
2011-09-23 1 0 1
2011-09-26 1 0 1
2011-09-27 1 0 1
2011-09-28 1 0 1
2011-09-29 0 0 0.5
2011-09-30 0 -1 -1
2011-10-03 0 -1 -1
2011-10-04 0 -1 -1
2011-10-05 0 -1 -1
2011-10-06 0 -1 -1
2011-10-07 0 -1 -1
2011-10-10 0 -1 -1
2011-10-11 0 -1 -1
2011-10-12 0 -1 -1
2011-10-13 0 -1 -1
2011-10-14 0 -1 -1
2011-10-17 0 -1 -1
2011-10-18 0 -1 -1
2011-10-19 0 -1 -1
2011-10-20 0 -1 -1
IBM longp IBM shortp IBM func result
2012-05-01 1 -1 1
2012-05-02 1 -1 1
2012-05-03 1 -1 1
2012-05-04 1 -1 1
2012-05-07 1 -1 1
2012-05-08 1 0 1
2012-05-09 1 0 1
2012-05-10 1 0 1
2012-05-11 1 0 1
2012-05-14 1 0 1
2012-05-15 1 0 1
2012-05-16 0 -1 -1
2012-05-17 0 -1 -1
2012-05-18 0 -1 -1
2012-05-21 0 -1 -1
2012-05-22 0 -1 -1
2012-05-23 0 -1 -1
2012-05-24 0 -1 -1
2012-05-25 0 -1 -1
2012-05-29 0 -1 -1
2012-05-30 0 -1 -1
2012-05-31 0 -1 -1
2012-06-01 0 -1 -1
2012-06-04 0 -1 -1
2012-06-05 0 -1 -1
2012-06-06 0 -1 -1
2012-06-07 0 -1 -1
2012-06-08 1 -1 1
2012-06-11 1 -1 1
2012-06-12 1 -1 1
2012-06-13 1 -1 1
2012-06-14 1 -1 1
2012-06-15 1 -1 1
2012-06-18 1 -1 1
2012-06-19 1 -1 1
2012-06-20 1 -1 1
2012-06-21 1 0 1
2012-06-22 1 0 1
2012-06-25 1 0 1
2012-06-26 1 0 1
2012-06-27 1 0 1
2012-06-28 1 0 1
2012-06-29 1 0 1
编辑:
我只是重新运行了一些旧代码,它们使用类似的循环通过 pandas DataFrame 来设置值。过去可能需要 5 秒,现在我发现它可能需要 100 倍。我想知道这个问题是否是由于最新版本的熊猫发生了一些变化。这是我能想到的唯一变数。请参阅下面的代码。使用 Pandas 0.11 在我的计算机上运行需要 73 秒。对于一个非常基本的功能来说,这似乎非常慢,尽管它是按元素操作的,但仍然如此。如果有人有机会,我会很好奇以下内容在您的计算机和您的熊猫版本上需要多长时间。
import time
import numpy as np
import pandas as pd
def timef(func, *args):
start= time.clock()
for i in range(2):
func(*args)
end= time.clock()
time_complete = (end-start)/float(2)
print time_complete
def tfunc(num_row, num_col):
df = pd.DataFrame(index = np.arange(1,num_row), columns = np.arange(1,num_col))
for col in df.columns:
for inum in range(1, len(df.index)):
df.ix[inum, col] = 0 #np.nan
return df
timef(tfunc, 1000, 1000) <<< This takes 73 seconds on a Core i5 M460 2.53gz Windows 7 laptop.
编辑 2 7-9-13 下午 1:23:
我找到了一个临时解决方案!我将代码更改为以下。基本上将每一列转换为 ndarray,然后将新列组装到 python 列表中,然后再插入新的 pandas DataFrame 中的列中。使用上面的旧版本做 50 列大约 2000 行需要 101 秒。以下版本仅需0.19秒!对我来说足够快了。不知道为什么 .ix 这么慢。就像我上面说的,在早期版本的 pandas 中,我相信元素操作要快得多。
def gen_fuzz_logic_signal3(longp, shortp):
# Input dataframes should have 0 or 1 value
flogic_signal = pd.DataFrame(index = longp.index, columns = longp.columns)
for sym in longp.columns:
coll = longp[sym].values
cols = shortp[sym].values
prev_enter = 0
newcol = [None] * len(coll)
for inum in range(1, len(coll)):
cur_val = np.nan
if coll[inum] == 0 and prev_enter == +1:
cur_val = 0.5
if cols[inum] == 0 and prev_enter == -1:
cur_val = -0.5
if coll[inum] == 1 and cols[inum] == -1:
if coll[inum -1] != 1:
cur_val = 1
prev_enter = 1
elif cols[inum-1] != -1:
cur_val = -1
prev_enter = -1
else:
cur_val = prev_enter
else:
if coll[inum] == 1:
cur_val = 1
prev_enter = 1
if cols[inum] == -1:
cur_val = -1
prev_enter = -1
newcol[inum] = cur_val
flogic_signal[sym] = newcol
return flogic_signal