1

介绍

我有一个以日期为输入的函数,在一段时间内进行一些计算 - 表示为Sys.sleep()- 删除'-'日期中的所有内容并返回一个字符:

library(maggritr)

auxialiaryCompute = function(vDate)
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
}

> auxialiaryCompute(as.Date("2015-01-14"))
[1] "20150114"

凉爽的。上面的输出是'20150114'。现在我想在这个函数中包含以前的输出。或前两天,或 ..n之前的输出,直到过去的有限天称为loopBackMaxDate

粗递归

这是一种可能的递归代码:

compute = function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
    d = as.Date # short alias

    dates = Filter(function(x) x>d(loopBackMaxDate), 
                   getPreviousDates(loopBackDays, d(vDate))) 

    if(length(dates)==0)
        return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))

    previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))

    auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
}

auxialiaryCompute = function(vDate, previousOutputs=list())
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
}

getPreviousDates = function(loopBackDays, vDate)
{
    if(loopBackDays==0) return()
    seq.Date(from=vDate-loopBackDays, to=vDate-1, by="days")
}

有了这个,我得到了和以前一样的结果(平均需要 1 秒):

> compute(as.Date("2015-01-14"))
[1] "20150114"

以下内容需要4几秒钟的时间:

> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.00    0.00    3.99 

我想计算以下内容,需要 3 秒:

> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.02    0.00    3.01 

这非常糟糕,因为我正在再次计算 的结果vDate="2014-05-04"vDate="2014-05-03"vDate="2014-05-02"它在调用时已经完成compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1)......

记忆递归

以下是我使用 memoized 的方式:

library(memoise)

compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
    d = as.Date # short alias

    dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate))) 

    if(length(dates)==0)
        return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))

    previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))

    auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
})

auxialiaryCompute = memoise(function(vDate, previousOutputs=list())
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
})

首次运行(实际需要 4 秒):

> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
  user  system elapsed 
  0.00    0.00    4.01 

第二次运行需要 1 秒,而我预计需要 0 秒:

> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.00    0.00    0.99 

我认为我在某个地方完全错了......我可以将输出存储在一个全局变量中,但我真的想让它与记忆化或连续样式传递一起工作,并避免冗余计算!

如果有人有想法,我将不胜感激!

4

1 回答 1

0

好的,首先,我在函数上放了一些 loginfo auxiliaryCompute

compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
    d = as.Date # short alias

    dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate))) 

    if(length(dates)==0)
    {
        loginfo("I reached the tail!")
        return(auxiliaryCompute(vDate=vDate, previousOutputs=0))
    }

    previousOutputs = lapply(dates, function(u){
                    compute(vDate=u, loopBackMaxDate=loopBackMaxDate, loopBackDays)
                  })

    auxiliaryCompute(vDate2=vDate, previousOutputs=previousOutputs)
})

auxiliaryCompute = memoise(function(vDate2, previousOutputs)
{
    loginfo("-------arguments in auxiliaryCompute are: vDate %s , previousOutputs %s", vDate2, unlist(previousOutputs))
#   Sys.sleep(1)
    vDate2 %>% as.character %>% gsub("-", "", .)
})

> compute("2015-01-10", "2015-01-01", 2)
2015-01-20 18:53:12 INFO::I reached the tail!
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-02 , previousOutputs 0
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-03 , previousOutputs 20150102
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-04 , previousOutputs 20150102,20150103
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-05 , previousOutputs 20150103,20150104
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-06 , previousOutputs 20150104,20150105
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-07 , previousOutputs 20150105,20150106
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-09 , previousOutputs 20150107,20150108
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-10 , previousOutputs 20150108,20150109
[1] "20150110"

> compute("2015-01-08", "2015-01-01", 2)
2015-01-20 18:54:11 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
[1] "20150108"

第一个日志很好,我们每次每个日期只去一次(不是用 memoize 重复)。然而奇怪的是,在第二个日志中,该函数auxiliaryCompute是用参数调用的,vDate 2015-01-08 , previousOutputs 20150106,20150107因为它已经被执行了(出现在第一个日志中)。

并且其他日期被正确记住了......只有第一个错误......这是因为它是一个字符串,并且递归中的其他日期被强制转换为日期格式。

通过将日期放入参数中,它可以工作:

> compute(as.Date("2015-01-08"), "2015-01-01", 2)
[1] "20150108"

这真的很狡猾,因为 R 不是一种强类型语言,主要是因为我通过“混淆”日期字符串来编码非常糟糕!

于 2015-01-20T18:03:57.083 回答