r - knitr 的缓存结果如何用于重现给定块中的环境？

Question

tl;博士

我的问题：在 R 会话中，是否有某种方法可以使用knitr的缓存结果“快进”到给定代码块中可用的环境（即对象集），就像它knit()本身一样？

设置：

knitr内置的代码块缓存是其杀手级功能之一。

当某些块包含耗时的计算时，它特别有用。除非它们（或它们所依赖的块）被更改，否则只需要在第一次knit编辑文档时执行计算：在所有后续调用时knit，块创建的对象只会从缓存中加载。

这是一个最小的示例，一个名为的文件"lotsOfComps.Rnw"：

\documentclass{article}
\begin{document}

The calculations in this chunk take a looooong time.

<<slowChunk, cache=TRUE>>=
Sys.sleep(30)  ## Stands in for some time-consuming computation
x <- sample(1:10, size=2)
@

I wish I could `fast-forward' to this chunk, to view the cached value of 
\texttt{x}

<<interestingChunk>>=
y <- prod(x)^2
y
@

\end{document}

编织和 TeXify 所需的时间"lotsOfComps.Rnw"：

## First time
system.time(knit2pdf("lotsOfComps.Rnw"))
##   user  system elapsed
##   0.07    0.02   31.81

## Second (and subsequent) runs
system.time(knit2pdf("lotsOfComps.Rnw"))
##   user  system elapsed
##   0.03    0.02    1.28

我的问题：

在 R 会话中，是否有某种方法可以使用knitr的缓存结果“快进”到给定代码块中可用的环境（即对象集），就像它knit()本身一样？

执行purl("lotsOfComps.Rnw")然后运行代码"lotsOfComps.R"是行不通的，因为沿途的所有对象都必须重新计算。

理想情况下，可以做这样的事情来结束在开头存在的环境中<<interestingChunk>>=：

spin("lotsOfComps.Rnw", chunk="interestingChunk")
ls()
# [1] "x"
x
# [1] 3 8

既然spin()（还没有？）可用，那么获得等效结果的最佳方法是什么？

score 6 · Accepted Answer

这一定是我一段时间以来写过的最丑陋的东西之一……

基本思想是扫描 .Rnw 文件中的块，提取它们的名称，检测哪些被缓存，然后确定哪些需要加载。一旦我们这样做了，我们将逐步扫描并获取每个需要加载的块名称，从缓存文件夹中检测数据库名称，然后使用lazyLoad. 在我们加载所有需要强制评估的块之后。丑陋，我敢肯定有一些错误，但它似乎适用于您提供的简单示例和我创建的其他一些最小示例。这假设 .Rnw 文件位于当前工作目录中......

load_cache_until <- function(file, chunk, envir = parent.frame()){
    require(knitr)

    # kludge to detect chunk names, which come before the chunk of
    # interest, and which are cached... there has to be a nicer way...
    text <- readLines(file)
    chunks <- grep("^<<.*>>=", text, value = T)
    chunknames <- gsub("^<<([^,>]*)[,>]*.*", "\\1", chunks)
    #detect unnamed chunks
    tmp <- grep("^\\s*$", chunknames)
    chunknames[tmp] <- paste0("unnamed-chunk-", seq_along(tmp))
    id <- which(chunk == chunknames)
    previouschunks <- chunknames[seq_len(id - 1)]
    cachedchunks <- chunknames[grep("cache\\s*=\\s*T", chunks)]

    # These are the names of the chunks we want to load
    extractchunks <- cachedchunks[cachedchunks %in% previouschunks]

    oldls <- ls(envir, all = TRUE)
    # For each chunk...
    for(ch in extractchunks){   
        # Detect the file name of the database...
        pat <- paste0("^", ch, ".*\\.rdb")
        val <- gsub(".rdb", "", dir("cache", pattern = pat))
        # Lazy load the database
        lazyLoad(file.path("cache", val), envir = envir)
    }
    # Detect the new objects added
    newls <- ls(envir, all = TRUE)
    # Force evaluation...  There is probably a better way
    # to do this too...
    lapply(setdiff(newls, oldls), get)

    invisible()

}

load_cache_until("lotsOfComps.Rnw", "interestingChunk")

让代码更健壮留给读者作为练习。

score 6 · Accepted Answer

这是一个解决方案，它仍然有点尴尬，但它有效。这个想法是添加一个名为的块选项，默认情况下mute采用NULL，但它也可以采用 R 表达式，例如mute_later()下面。当knitr评估块选项时，mute_later()可以评估并NULL返回；同时，有副作用opts_chunk（设置全局块选项，如eval = FALSE）。

现在您需要做的是放入mute=mute_later()要跳过其余块的块，例如，您可以将此选项从example-a移至example-b。因为mute_later()返回NULL恰好是选项的默认值，所以mute即使您移动此选项，缓存也不会被破坏。

\documentclass{article}
\begin{document}

<<setup, include=FALSE, cache=FALSE>>=
rm(list = ls(all.names = TRUE), envir = globalenv())
opts_chunk$set(cache = TRUE) # enable cache to make it faster
opts_chunk$set(eval = TRUE, echo = TRUE, include = TRUE)

# set global options to mute later chunks
mute_later = function() {
  opts_chunk$set(cache = FALSE, eval = FALSE, echo = FALSE, include = FALSE)
  NULL
}
# a global option mute=NULL so that using mute_later() will not break cache
opts_chunk$set(mute = NULL)
@

<<example-a, mute=mute_later()>>=
x = rnorm(4)
Sys.sleep(5)
@

<<example-b>>=
y = rpois(10,5)
Sys.sleep(5)
@

<<example-c>>=
z = 1:10
Sys.sleep(3)
@

\end{document}

从某种意义上说，这很尴尬，您必须, mute=mute_later()到处剪切和粘贴。理想情况下，您应该像我为 Barry 写的要点一样设置块标签。

我原来的要点不起作用的原因是因为在缓存块时忽略了块挂钩。第二次你knit()的文件，块挂钩被跳过，因此checkpoint对于其余的块，你看到所有的块都被评估了。相比之下，块选项总是动态评估的。example-aeval=TRUE

score 3 · Accepted Answer

Yihui指出了一个几乎完全符合我要求的要点。

针对 Barry Rowlingson（又名 Spacedman）提出的问题，Yihui 构建了一个“检查点”钩子，让用户设置将通过调用 knit 处理的最后一个块的名称。要通过一个命名处理块example-a，只需opts_chunk$set(checkpoint = 'example-a')在初始“设置”块中的某处进行。

该解决方案运行良好 --- 第一次使用给定的检查点运行。不幸的是，第二次和随后的时间knit似乎忽略了检查点并处理了所有块。（我在下面讨论了一种解决方法，但这并不理想）。

以下是易辉要点的略删减版：

\documentclass{article}
\begin{document}

<<setup, include=FALSE>>=
rm(list = ls(all.names = TRUE), envir = globalenv())
opts_chunk$set(cache = TRUE) # enable cache to make it faster
opts_chunk$set(eval = TRUE, echo = TRUE, include = TRUE)

# Define hook that will skip all chunks after the one named in checkpoint
knit_hooks$set(checkpoint = function(before, options, envir) {
if (!before && options$label == options$checkpoint) {
opts_chunk$set(cache = FALSE, eval = FALSE, echo = FALSE, include = FALSE)
}
})

## Set the checkpoint
opts_chunk$set(checkpoint = 'example-a') # restore objects up to example-a
@

<<example-a>>=
x = rnorm(4)
@

<<example-b>>=
y = rpois(10,5)
@

<<example-c>>=
z = 1:10
@

\end{document}

因为checkpoint="example-a"，上面的脚本应该运行第二个块，然后抑制所有进一步的块，包括创建y和的块z。让我们尝试几次，看看会发生什么：

library(knitr)

## First time, works like a charm
knit("checkpoint.Rnw")
ls()
[1] "x"

## Second time, Oops!, runs right past the checkpoint
knit("checkpoint.Rnw")
ls()
[1] "x" "y" "z"

我上面提到的解决方法是，在第一次运行之后，

编辑checkpoint.Rnw以设置另一个检查点（通过执行，例如，opts_chunk$set(checkpoint = 'example-b')）
运行knit("checkpoint.Rnw")，
编辑checkpoint.Rnw以将检查点设置回'example-a, （通过执行 , opts_chunk$set(checkpoint = 'example-a')）
再跑knit("checkpoint.Rnw)一次。这将再次处理所有块，但不超过example-a.

这比重新计算块中的所有对象要快得多，因此了解这一点是件好事，即使它并不理想。

score -1 · Accepted Answer

在 markdown 文件的底部添加以下代码块怎么样？

```{r save_workspace_if_not_saved_yet, echo=FALSE}
if(!file.exists('knitr_session.RData')) {
  save.image(file = 'knitr_session.RData')
}
```

第一次编织时，将保存流程结束时的工作区状态（假设流程不会产生任何错误）。每次您想要最新版本的工作区时，只需删除工作目录中的文件即可。

score -3 · Accepted Answer

它们就像任何由save. 如果您从它的新位置获取 knitr-cache 示例，它只是：

> library(knitr)
> knit("./005-latex.Rtex")
> load("cache/latex-my-cache_d9835aca7e54429f59d22eeb251c8b29.RData")
> ls()
 [1] "x"

r - knitr 的缓存结果如何用于重现给定块中的环境？

tl;博士

设置：

我的问题：

5 回答 5

Related

Reference