r - R：R 笔记本范围 (Rmd) 工作流程中的 renv

Question

我正在寻找一种方法，使我的以 R 笔记本为中心的工作流程更具可重复性，并且随后更容易使用 Docker 进行容器化。对于我的中型数据分析项目，我使用一个非常简单的结构：一个与 .Rproj 关联的文件夹和一个 index.html（这是 Github Pages 的登录页面），其中包含其他文件夹，其中包含笔记本、数据、脚本等。这种简单的“1 GitHub repo = 1 Rproj”结构也适用于 Github Pages 呈现的我的 nb.html 文件。

.
└── notebooks_project
    ├── notebook_1
    │   ├── notebook_1.Rmd
    │   └── ...
    ├── notebook_2
    │   ├── notebook_2.Rmd
    │   └── ...
    ├── notebooks_project.Rproj
    ├── README.md
    ├── index.html 
    └── .gitignore

我希望保留这个使用 R 笔记本作为文学编程工具和控制文档的工作流程（请参阅RMarkdown Driven Development），因为它似乎非常适合中等可重现的分析项目。不幸的是，缺乏关于使用 Rmd 的工作流的文档renv，尽管它似乎与它很好地集成。

首先，谢一辉在这里暗示，与使用 renv 对单个 Rmd 文档相关的方法包括：renv::activate()、renv::use()和renv::embed(). renv::activate()只做部分工作renv::init()：加载项目并获取init.R. 据我了解，如果项目已经初始化，它会执行此操作，但它的行为就像renv::init()项目未初始化：发现依赖项，将它们复制到 renv 全局包缓存，写入多个文件（.Rprofile、renv/activate.R、renv/ .gitignore，.Rbuildignore）。renv::use()在独立的 R 脚本中运行良好，其中脚本的依赖项直接在该脚本中指定，我们需要在运行相关脚本时自动安装和加载这些包。renv::embed()只是将的紧凑表示嵌入renv.lock到笔记本的代码块中 - 它通过添加具有依赖关系的代码块并删除对 .Rmd 的调用来更改渲染/保存时的 .Rmd renv::embed()。据我了解，对于可重现的独立笔记本来说，使用renv::embed()and可能就足够了。renv::use()不过，我不介意将锁定文件放在目录中或保留 renv 库，只要它们都在同一个目录中。

其次，与RStudio 包管理器renv一起使用，为后续的 Binder 或 Docker 需求做准备。Grant McDermott在这里提供了一些有用的代码（我认为可能在 .Rprofile 或 .Rmd 本身中）并提供了它的基本原理：

锁定文件是针对 RSPM 作为默认包存储库（即从哪里下载包）的引用，而不是通常的 CRAN 镜像之一。除其他外，这使得跨不同包版本的时间旅行和在 Linux 上快速安装预编译的 R 包二进制文件成为可能。

第三，我想使用here包来处理相对路径。这似乎是让笔记本在传输时或在 Docker 容器内运行时运行的方法。不幸here::here()的是，查找 .Rproj 并将在我的上层文件夹（即notebooks_project）中找到它。.here可以放置的文件会here::set_here()覆盖此行为，使其here::here()按预期指向笔记本文件夹（即notebook1）。不幸的是，该.here文件仅在重新启动 R 会话或运行时生效unloadNamespace("here")（在此处记录）。

到目前为止，这是我尝试过的：

---
title: "<br> R Notebook Template" 
subtitle: "RMardown Report"
author: "<br> Claudiu Papasteri"
date: "`r format(Sys.time(), '%d %m %Y')`"
output: 
    html_notebook:
            code_folding: hide
            toc: true
            toc_depth: 2
            number_sections: true
            theme: spacelab
            highlight: tango
            font-family: Arial
---

```{r setup, include = FALSE}
  
# Set renv activate the current project
renv::activate()

# Set default package source by operating system, so that we automatically pull in pre-built binary snapshots, rather than building from source.
# This can also be appended to .Rprofile 
if (Sys.info()[["sysname"]] %in% c("Linux", "Windows")) {  # For Linux and Windows use RStudio Package Manager (RSPM)
    options(repos = c(RSPM = "https://packagemanager.rstudio.com/all/latest"))
    } else {
        # For Mac users, we default to installing from CRAN/MRAN instead, since RSPM does not yet support Mac binaries.
        options(repos = c(CRAN = "https://cran.rstudio.com/"))
        # options(renv.config.mran.enabled = TRUE) ## TRUE by default
    }
options(renv.config.repos.override = getOption("repos"))

# Install (if necessary) & Load packages
packages <- c(
  "tidyverse", "here"
)
renv::install(packages, prompt = FALSE)    # install packages that are not in cache
renv::hydrate(update = FALSE)              # install any packages used in the Rnotebook but not provided, do not update  
renv::snapshot(prompt = FALSE)


# Set here to Rnotebook directory
here::set_here()
unloadNamespace("here")                   # need new R session or unload namespace for .here file to take precedence over .Rproj
rrRn_name <- fs::path_file(here::here())

# Set kintr options including root.dir pointing to the .here file in Rnotebook directory
knitr::opts_chunk$set(root.dir = here::here())

# ???
renv::use(lockfile = here::here("renv.lock"), attach = TRUE)  # automatic provision an R library when Rnotebook is run and load packages
# renv::embed(path = here::here(rrRn_name), lockfile = here::here("renv.lock"))  # if run this embeds the renv.lock inside the Rnotebook

renv::status()$synchronized
```

我希望我的 nobooks 能够在本地（已安装、缓存依赖项和初始化项目的位置）以及转移到其他系统时无需更改代码即可运行。每个笔记本都应该有自己的 renv 设置。

我有很多问题：

我的 renv 序列有什么问题？每次运行（初始化和之后）都调用是renv::activate()要走的路吗？我应该使用renv::use()而不是renv::install()andrenv::hydrate()吗？renv::embed()即使每个笔记本文件夹都应该有其renv.lock和库，对于可重现的工作流程是否更好？renv激活时还会创建一个 .Rproj 文件（例如notebook1.Rproj），从而破坏了我的简单 1 repo = 1 Rproj - 这应该与我有关吗？
renv-RSPM 工作流程看起来很棒，但是将该脚本存储在 .Rprofile 中而不是将其存储在 Rmd 本身中是否有任何优势？
有没有更好的使用方法here？这unloadNamespace("here")似乎很老套，但似乎是保留.here文件使用的唯一方法。

score 1 · Accepted Answer

我的 renv 序列有什么问题？每次运行（初始化和之后）调用 renv::activate() 是要走的路吗？我应该使用 renv::use() 而不是 renv::install() 和 renv::hydra()？即使每个笔记本文件夹都应该有它的 renv.lock 和库， renv::embed() 是否更适合可重现的工作流程？

如果您已经有一个要使用的锁定文件 + 与您的项目相关联，那么我建议您只调用renv::restore(lockfile = "/path/to/lockfile")，而不是使用renv::use()or renv::embed()。这些工具专门用于您不想使用外部锁定文件的情况；也就是说，您宁愿将文档的依赖项嵌入到文档本身中。

关于renv::restore()vs的问题renv::install()归结为您是否想要锁定文件中编码的确切包版本，或者您的会话可见的 R 包存储库中的最新/最新版本。我认为最典型的工作流程是这样的：

根据需要使用renv::install()、renv::hydrate()或其他工具安装软件包；
确认您的文档处于良好、可运行的状态，
调用renv::snapshot()“保存”该状态，
renv::restore()在以后的文档运行中使用以“加载”先前保存的状态。

激活时的 renv 还会创建一个 .Rproj 文件（例如 notebook1.Rproj），从而破坏了我的简单 1 repo = 1 Rproj - 这应该与我有关吗？

如果这是不受欢迎的行为，您可能需要在https://github.com/rstudio/renv/issues提交错误报告，并提供更多上下文。

renv-RSPM 工作流程看起来很棒，但是将该脚本存储在 .Rprofile 中而不是将其存储在 Rmd 本身中是否有任何优势？

这仅取决于您希望该配置的可见程度。您是否希望它对在该项目目录中启动的所有 R 会话都处于活动状态？如果是这样，那么它可能属于.Rprofile. 你只希望它对那个特定的 R Markdown 文档有效吗？如果是这样，可能值得包括在内。（将其捆绑在 R Markdown 文件中也更易于共享，因为您可以只共享 R Markdown 文档而无需共享项目 / .Rprofile）

这里有更好的使用方法吗？那个 unloadNamespace("here") 看起来很老套，但它似乎是保留 .here 文件使用的唯一方法。

如果我理解正确，您可以.here在加载here包之前自己手动创建一个文件，例如

file.create("/path/to/.here")
library(here)

因为那是set_here()真的。

r - R：R 笔记本范围 (Rmd) 工作流程中的 renv

1 回答 1

Related

Reference