r - knitr 与交互式 R 行为

Question

在我注意到这是 knitr 的作者建议的获得更多帮助的方法之后，我在这里重新发布了我的问题。

我对 .Rmd 文件有点困惑，我可以在交互式 R 会话中逐行处理，也可以使用 .Rmd 文件，R CMD BATCH但在使用 .Rmd 文件时会失败knit("test.Rmd")。我不确定问题出在哪里，我试图尽可能缩小问题的范围。这是示例（在中test.Rmd）：

```{r Rinit, include = FALSE, cache = FALSE}
opts_knit$set(stop_on_error = 2L)
library(adehabitatLT)
```

The functions to be used later:

```{r functions}
ld <- function(ltraj) {
    if (!inherits(ltraj, "ltraj")) 
        stop("ltraj should be of class ltraj")
    inf <- infolocs(ltraj)
    df <- data.frame(
        x = unlist(lapply(ltraj, function(x) x$x)),
        y = unlist(lapply(ltraj, function(x) x$y)),
        date = unlist(lapply(ltraj, function(x) x$date)),
        dx = unlist(lapply(ltraj, function(x) x$dx)),
        dy = unlist(lapply(ltraj, function(x) x$dy)),
        dist = unlist(lapply(ltraj, function(x) x$dist)),
        dt = unlist(lapply(ltraj, function(x) x$dt)),
        R2n = unlist(lapply(ltraj, function(x) x$R2n)),
        abs.angle = unlist(lapply(ltraj, function(x) x$abs.angle)),
        rel.angle = unlist(lapply(ltraj, function(x) x$rel.angle)),
        id = rep(id(ltraj), sapply(ltraj, nrow)),
        burst = rep(burst(ltraj), sapply(ltraj, nrow)))
    class(df$date) <- c("POSIXct", "POSIXt")
    attr(df$date, "tzone") <- attr(ltraj[[1]]$date, "tzone")
    if (!is.null(inf)) {
        nc <- ncol(inf[[1]])
        infdf <- as.data.frame(matrix(nrow = nrow(df), ncol = nc))
        names(infdf) <- names(inf[[1]])
        for (i in 1:nc) infdf[[i]] <- unlist(lapply(inf, function(x) x[[i]]))
        df <- cbind(df, infdf)
    }
    return(df)
}
ltraj2sldf <- function(ltr, proj4string = CRS(as.character(NA))) {
    if (!inherits(ltr, "ltraj")) 
        stop("ltr should be of class ltraj")
    df <- ld(ltr)
    df <- subset(df, !is.na(dist))
    coords <- data.frame(df[, c("x", "y", "dx", "dy")], id = as.numeric(row.names(df)))
    res <- apply(coords, 1, function(dfi) Lines(Line(matrix(c(dfi["x"], 
        dfi["y"], dfi["x"] + dfi["dx"], dfi["y"] + dfi["dy"]), 
        ncol = 2, byrow = TRUE)), ID = format(dfi["id"], scientific = FALSE)))
    res <- SpatialLinesDataFrame(SpatialLines(res, proj4string = proj4string), 
        data = df)
    return(res)
}
```

I load the object and apply the `ltraj2sldf` function:

```{r fail}
load("tr.RData")
juvStp <- ltraj2sldf(trajjuv, proj4string = CRS("+init=epsg:32617"))
dim(juvStp)
```

使用knitr("test.Rmd")失败：

label: fail
Quitting from lines 66-75 (test.Rmd) 
Error in SpatialLinesDataFrame(SpatialLines(res, proj4string = 
proj4string),  (from     <text>#32) : 
  row.names of data and Lines IDs do not match

发生错误后直接在 R 控制台中使用调用按预期工作......

问题与在 ID 100,000 之前format生成 ID 的方式（在的apply调用中ltraj2sldf）有关：使用交互式调用，R 给出“99994”、“99995”、“99996”、“99997”、“99998”、“ 99999”、“100000”；使用 knitr R 给出“99994”、“99995”、“99996”、“99997”、“99998”、“99999”、“100000”，并带有额外的前导空格。

发生这种行为有什么原因吗？为什么应该knitr与 R 中的直接调用不同？我不得不承认我很难使用那个，因为我无法调试它（它在交互式会话中工作）！

任何提示将不胜感激。如果有帮助，我可以提供 .RData（文件为 4.5 Mo），但我最感兴趣的是为什么会发生这种差异。我试图提出一个可自我复制的例子，但没有成功，对此感到抱歉。提前感谢您的任何贡献！

在 baptiste 的评论之后，这里有一些关于 ID 生成的更多细节。apply基本上，ID 是通过调用在数据帧的每一行生成的，而调用format又像这样使用：format(dfi["id"], scientific = FALSE). 在这里，列id只是一个从 1 到行数 ( 1:nrow(df)) 的序列。scientific = FALSE只是为了确保我没有100000的1e + 05之类的结果。

根据对 ID 生成的探索，该问题仅出现在第一条消息中出现的那些，即 99995 到 99999 中，为此添加了前导空格。这个format调用不应该发生这种情况，因为我没有要求输出中的特定位数。例如：

> format(99994:99999, scientific = FALSE)
[1] "99994" "99995" "99996" "99997" "99998" "99999"

但是，如果 ID 是分块生成的，则可能会发生：

> format(99994:100000, scientific = FALSE)
[1] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

请注意，一次处理一个相同的结果会产生预期的结果：

> for (i in 99994:100000) print(format(i, scientific = FALSE))
[1] "99994"
[1] "99995"
[1] "99996"
[1] "99997"
[1] "99998"
[1] "99999"
[1] "100000"

最后，这就像没有一次准备一个 ID 一样（正如我所期望的那样apply），但在这种情况下，一次 6 个，并且只有在接近 1e+05 时...当然，仅在使用 knitr 时，而不是交互式或批处理 R。

这是我的会话信息：

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.2           adehabitatLT_0.3.12 CircStats_0.2-4    
[4] boot_1.3-9          MASS_7.3-27         adehabitatMA_0.3.6 
[7] ade4_1.5-2          sp_1.0-11           basr_0.5.3         

loaded via a namespace (and not attached):
[1] digest_0.6.3    evaluate_0.4.4  formatR_0.8     fortunes_1.5-0 
[5] grid_3.0.1      lattice_0.20-15 stringr_0.6.2   tools_3.0.1

score 3 · Accepted Answer

Jeff 和 baptiste 确实是对的！这是一个选项问题，与digits论点有关。我设法想出了一个可行的最小示例（例如 in test.Rmd）：

Simple reproducible example : df1 is a data frame of 110,000 rows,
with 2 random normal variables + an `id` variable which is a series
from 1 to the number of row. 

```{r example}
df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
```

From this, we create a `id2` variable using `format` and `scientific =
FALSE` to have results with all numbers instead of scientific
notations (e.g. 100,000 instead of 1e+05):

```{r example-continued}
df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df1$id2[99990:100010]
```

它以交互方式使用 R 按预期工作，导致：

 [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996" 
 [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

然而，结果是完全不同的使用knit：

> library(knitr)
> knit("test.Rmd")

[...]

##  [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
##  [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
## [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

请注意 99994 之后的额外前导空格。差异实际上来自digits选项，正如 Jeff 正确建议的那样：R 默认使用 7，而 knitr 使用 4。这种差异会影响的输出format，尽管我不太明白发生了什么这里。R型：

> options(digits = 7)
> format(99999, scientific = FALSE)
[1] "99999"

针织风格：

> options(digits = 4)
> format(99999, scientific = FALSE)
[1] " 99999"

但它应该影响所有数字，而不仅仅是在 99994 之后（好吧，老实说，我什至不明白为什么它要添加前导空格）：

> options(digits = 4)
> format(c(1:10, 99990:100000), scientific = FALSE)
 [1] "     1" "     2" "     3" "     4" "     5" "     6" "     7"
 [8] "     8" "     9" "    10" " 99990" " 99991" " 99992" " 99993"
[15] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"

由此，我不知道哪个是错误的：knitr，apply还是format？至少，我想出了一个解决方法，trim = TRUE使用format. 它没有解决问题的原因，但确实删除了结果中的前导空格......

score 2 · Accepted Answer

我使用此信息向您的knitr GitHub 问题添加了评论。

format()adds the extra whitespace when the digitsoption is not sufficient to display a value but scientific=FALSEis also specified. knitrdigits在代码块内设置为 4 ，这会导致您描述的行为：

options(digits=4)
format(99999, scientific=FALSE)

产生：

[1] " 99999"

尽管：

options(digits=5)
format(99999, scientific=FALSE)

产生：

[1] "99999"

score 0 · Accepted Answer

感谢 Aleksey Vorona 和 Duncan Murdoch，此错误现已在 R-devel 中修复！

见：https ://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15411

r - knitr 与交互式 R 行为

3 回答 3

Related

Reference