r - ggplot 和 pgfSweave 的问题

Question

我前段时间开始使用 Sweave。然而，像大多数人一样，我很快就遇到了一个主要问题：速度。编织大型文档需要很长时间才能运行，这使得高效工作非常具有挑战性。使用 cacheSweave 可以极大地加速数据处理。但是，绘图 - 尤其是 ggplot ;) - 仍然需要很长时间才能渲染。这就是我想使用 pgfSweave 的方式。

经过很多很多小时后，我终于成功地使用 Eclipse/StatET/Texlipse 建立了一个工作系统。然后，我想将现有报告转换为与 pgfSweave 一起使用，但有一个严重的意外：我的大多数 ggplots 似乎不再工作了。例如，以下图在控制台和 Sweave 中完美运行：

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

但是，使用 pgfSweave 运行它，我得到了这个错误：

Error in if (width > 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (width > 0) { :
  the condition has length > 1 and only the first element will be used
Error in driver$runcode(drobj, chunk, chunkopts) : 
  Error in if (width > 0) { : missing value where TRUE/FALSE needed

当我从 geom_point 中删除 aes(...) 时，该图与 pgfSweave 完美配合。

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)

编辑：我对问题进行了更多调查，可以将问题减少到 tikz-device。

这工作得很好：

quartz()
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

这给出了上述错误：

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)
dev.off()

这也很好用：

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)
dev.off()

我可以用 5 个不同的 ggplots 重复这个。在映射中不使用颜色（或大小、alpha、...）时，它适用于 tikz。

Q1：有人对此行为有任何解释吗？

此外，非绘图代码块的缓存效果不佳。使用 Sweave，以下代码块完全不需要时间。使用 pgfSweave 大约需要 10 秒。

<<plot.opts,echo=FALSE,results=hide,cache=TRUE>>=
#colour and plot options are globally set
pal1 <- brewer.pal(8,"Set1")
pal_seq <- brewer.pal(8,"YlOrRd")
pal_seq <- c("steelblue1","tomato2")
opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white"))
sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2")
ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) 
orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2)
ts1 <- 2.3
ts2 <- 2.5
ts3 <- 2.8
ps1 <- 6
offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y))
offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y))
plot_size <- 50*50

这似乎也是一个很奇怪的行为，因为只设置了一些变量供以后使用。

Q2：有人对此有任何解释吗？

Q3：更一般地说，我想问一下是否有人成功地使用了 pgfSweave？成功我的意思是所有在 Sweave 中工作的东西也可以在 pgfSweave 中工作，还有漂亮的字体和提高速度的额外好处。;)

非常感谢您的回复！

score 4 · Accepted Answer

Q1：有人对此行为有任何解释吗？

以下是 tikzDevice 在尝试构建绘图时出错的三个原因：

当您添加创建图例的美学映射时，例如aes(colour=que_id)，ggplot2 将使用变量名称作为图例的标题——在本例中为 que_id。
tikzDevice 将所有字符串（例如图例标题）传递给 LaTeX 进行排版。
在 LaTeX 中，下划线字符 ,_用于表示下标。如果在数学模式之外使用下划线，则会导致错误。

当 tikzDevice 尝试计算图例标题“que_id”的高度和宽度时，它会将字符串传递给 LaTeX 进行排版，并期望 LaTeX 返回字符串的宽度和高度。LaTeX 出现错误，因为在 mathmode 之外的字符串中使用了未转义的下划线。tikzDevice 收到一个NULL字符串宽度而不是一个导致if (width > 0)检查失败的数字。

避免问题的方法

通过添加色标来指定要使用的图例标题：

p1 <- ggplot(plot_info, aes(elevation, area))
p1 <- p1 + geom_point(aes(colour=que_id))


# Add a name that is easier for humans to read than the variable name
p1 <- p1 + scale_colour_brewer(name="Que ID")


# Or, replace the underscore with the appropriate LaTeX escape sequence
p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id")

使用 tikzDevice 0.5.0 中引入的字符串清理功能（但在 0.5.2 之前被破坏）。目前，字符串清理只会转义以下字符：、、、、和%默认情况下。但是，您可以通过and选项指定其他替换对：${}^tikzSanitizeCharacterstikzReplacementCharacters

# Add underscores to the sanitization list
options(tikzSanitizeCharacters = c('%','$','}','{','^', '_'))
options(tikzReplacementCharacters = c('\\%','\\$','\\}','\\{',
  '\\^{}', '\\textunderscore'))


# Turn on string sanitization when starting the plotting device
tikz('myPlot.tex', standAlone = TRUE, sanitize = TRUE)
print(p1)
dev.off()

我们将在接下来的几周内发布 tikzDevice 的 0.5.3 版本，以解决由于 R 处理方式的变化而出现的一些恼人的警告消息system()。我将在下一个版本中添加以下更改：

更好width的警告消息NULL表明绘图文本可能有问题。
将下划线和一些其他字符添加到字符串清理程序查找的默认字符集中。

希望这可以帮助！

score 3 · Accepted Answer

Q2：我是 pgfsweave 的维护者。

以下是我运行的测试结果：

time R CMD Sweave time-test.Rnw 

real    0m1.133s
user    0m1.068s
sys     0m0.054s

time R CMD pgfsweave time-test.Rnw 

real    0m2.941s
user    0m2.413s
sys     0m0.364s

time R CMD pgfsweave time-test.Rnw 

real    0m2.457s
user    0m2.112s
sys     0m0.283s

我相信时差有两个原因，但要准确验证它们需要更多的工作：

pgfSweave 进行了大量的检查和双重检查，以确保它不会重做昂贵的计算。目标是使在文档中进行更昂贵的计算和绘图变得可行。在这种情况下，“昂贵”的规模远远超过额外的一两次检查。

作为缓存的示例，请考虑以下测试文件，以了解缓存的真正好处：

\documentclass{article}

\begin{document}

<<plot.opts,cache=TRUE>>=
x <- Sys.sleep(10)
@

\end{document}

结果：

time R CMD Sweave time-test2.Rnw 

real    0m10.334s
user    0m0.283s
sys     0m0.047s

time R CMD pgfsweave time-test2.Rnw 

real    0m12.032s
user    0m1.356s
sys     0m0.349s

time R CMD pgfsweave time-test2.Rnw 

real    0m1.423s
user    0m1.121s
sys     0m0.266s

Sweave 在 R 2.12 中发生了一些变化。这些更改可能加快了代码块评估的过程，并将 pgfSweave 留在了这些较小的计算中。值得研究

Q3：我一直在自己的工作中使用 pgfSweave。R 2.12 中的 Sweave 发生了一些变化，导致 pgfSweave 出现了一些小问题，但即将推出的新版本可以修复所有问题。github 上的开发版本 ( https://github.com/cameronbracken/pgfSweave ) 已经有了变化。如果您还有其他问题，我很乐意提供帮助。

score 1 · Accepted Answer

Q2：您是否在图形块\pgfrealjobname{<DOCUMENTNAME>}的标题和选项中使用？external=TRUE我发现这会大大提高速度（不是第一次编译，而是如果图形不变，则用于后续编译）。您将在 pgfSweave 小插图中找到更多背景信息。

Q3：对我来说一切都很好，我像你一样使用 Windows + Eclipse/StatEt/Texlipse。

r - ggplot 和 pgfSweave 的问题

3 回答 3

Related

Reference