在 Polars-python 中,我可以做这个懒惰的动作,它花费大约 17 毫秒,几乎与急切版本的时间相同。数据有 100000 行。
数据样本:
code date open close change_predict factor factor_cta
A 2010-01-04 4080.0 4057.0 False 16.0 1.0
B 2010-01-04 4067.0 4066.0 False 16.0 1.0
A 2010-01-05 4066.0 4154.0 False 17.0 1.0
B 2010-01-05 4165.0 4044.0 False 18.0 1.0
A 2010-01-08 4040.0 3981.0 False 17.0 1.0
#python lazy mode
xx = data.lazy().groupby('date').agg([
pl.col("code"),
pl.col("open"),
pl.col("close"),
pl.col("change_predict"),
pl.col("code").is_in(pl.col("code").sort_by('factor').head(5).filter(pl.col("factor_cta")==1)).alias('buy'),
pl.col("code").is_in(pl.col("code").sort_by('factor').tail(5).filter(pl.col("factor_cta")==0)).alias('sell')
]).sort('date').explode(pl.exclude('date'))
xx = xx.collect()
#17.8 ms ± 62.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#python eager mode
x = data.groupby('date').agg([
pl.col("code"),
pl.col("open"),
pl.col("close"),
pl.col("change_predict"),
pl.col("code").is_in(pl.col("code").sort_by('factor').head(5).filter(pl.col("factor_cta")==1)).alias('buy'),
pl.col("code").is_in(pl.col("code").sort_by('factor').tail(5).filter(pl.col("factor_cta")==0)).alias('sell')
]).sort('date').explode(pl.exclude('date'))
17.7 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
但是在 Polars-rust 中,为什么惰性操作(基于发布模式)更慢?
//rust lazy mode
let mut sw = stopwatch::Stopwatch::new();
sw.restart();
let x = data
.lazy()
.groupby([col("date")])
.agg([
col("code"),
col("open"),
col("close"),
col("change_predict"),
col("code").is_in(col("code").sort_by([col("factor")],[false]).head(Some(5)).filter(col("factor_cta").eq(lit(1))).alias("buy"),
col("code").is_in(col("code").sort_by([col("factor")],[false]).tail(Some(5)).filter(col("factor_cta").eq(lit(0)))).alias("sell"),
])
.unwrap()
.sort("date", false)
.explode([col("*").exclude(["date"])])
.unwrap();
println!("Groupby Date Success {:#?}", sw.elapsed());
//Groupby Date Success 51.4484ms
shape: (102238, 7)
似乎 Polars-rust 中的 groupby.agg(non-lazy) 不能像 python 一样做同样的事情(complex expr)?