我正在尝试生成一个质量控制报告,该报告在多个文件夹(每个文件夹对应一个实验)上循环(sapply),并为每个加载结果创建表和图(在函数内)。生成的 pdf 应包含文件夹的名称,后跟按顺序排列的表格和绘图。我首先创建了 R 脚本(运行良好),然后创建了一个 rnw 文件。确实生成了图,但有两个问题(pdf 输出):
在块 loop_n_plots 中没有生成表;
在创建了所有图之后,会出现一条意外的混乱线,看起来像列表的输出。
问:如何在我的 pdf 中获取表格?在块“table_files”中生成的表有效,但应用函数内的表无效。为什么?更一般地说,对于 knitr 报告,我正在尝试做的事情(以及我是如何做的)可以吗?最好在列表中添加表格和绘图,然后遍历列表以打印它们?
我已经使用块设置玩了一段时间,但没有任何效果。
示例代码:
\documentclass{report}
\begin{document}
\title{Sequencing Quality Report}
\author{Deep Sequencing Group - SFB655}
\maketitle
<<knitr_option, cache=FALSE, echo=FALSE, results='hide'>>=
library(knitr)
## set global chunk options
opts_chunk$set(fig.align='center', fig.width=14, fig.heigth=8, out.width="1.2\\textwidth", par=TRUE)
@
<<R_arguments, cache=FALSE, echo=FALSE, include=FALSE>>=
###### Libraries ######
library(reshape)
library(ggplot2)
theme_set(theme_bw(16)) # removes grey grid and increases letter size. Ideal for presentations
library(RColorBrewer)
library(plyr)
library(scales) # for natural numbers in axis
library(xtable)
library(rattle) # needed to generate a table in knitr?
#######################
###### Function definitions ######
## ggplot theme with extra space between legends and axis
gg.axis.space <- theme(axis.title.y=element_text(vjust=0.2), axis.title.x=element_text(vjust=0.2))
ReturnStatsPlotsAndTables <- function(fqc.folder){
# for(fqc.folder in fq_fastqc.folders){
######################################
## for each folder in the vector will
## plot stats and
## print tables of fastQC results
## which library is being analysed?
fastq.lib <- data.frame(Libraries = gsub(".*/(L.*)\\.fq_fastqc", "\\1", fqc.folder, perl=T))
xtable(fastq.lib)
## Basic statistics - table ##
stats.path <- paste(fqc.folder, "/", "Basic_Statistics_fastqc_data.temp", sep="")
basic.stats <- read.table(stats.path, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# basic.stats[ ,1:2]
xtable(basic.stats[ ,1:2])
## Summary of filters - table ##
stats.path <- paste(fqc.folder, "/", "filters_summary_fastqc_data.temp", sep="")
summary.filters <- read.table(stats.path,
header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# summary.filters
xtable(summary.filters)
## Per base sequence quality ##
stats.path <- paste(fqc.folder, "/", "Per_base_sequence_quality_fastqc_data.temp", sep="")
base.qual <- read.table(stats.path,
header = TRUE, sep = "\t", stringsAsFactors = FALSE)
base.qual$Base <- factor(base.qual$Base, as.character(base.qual$Base)) # re-order the levels by order of appearance in DF
plot.new()
base.qual.p <- ggplot(base.qual, aes(x = Base, ymin = X10th.Percentile, lower = Lower.Quartile, middle = Median, upper = Upper.Quartile, ymax = X90th.Percentile, fill = Lower.Quartile)) + geom_boxplot(stat = "identity") +
theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1)) +
annotate("rect", xmin=-Inf, xmax=Inf, ymin=0, ymax=20, alpha=0.1, fill="red") +
annotate("rect", xmin=-Inf, xmax=Inf, ymin=20, ymax=28, alpha=0.1, fill="yellow") +
annotate("rect", xmin=-Inf, xmax=Inf, ymin=28, ymax=Inf, alpha=0.1, fill="green") +
ggtitle("Per base sequence quality") + ylab("Quality score (Phred score) ") + xlab("Position of base in read")
print(base.qual.p)
}
@
\chapter{Preamble}
This an automated quality control report generated for the following fastq files:
<<table_files, echo=FALSE, results="asis">>=
##############################################
## loop over fastQC folder and parse txt files:
## list and read fastqc_data.temp old files
# testing #
# setwd("/projects/seq-work/analysis/martinad/p0196-totalRNA/")
folder <- "./"
filenames <- list.files(path=folder, pattern="fastqc_data.temp", recursive=TRUE)
fq_fastqc.folders <- unique(dirname(filenames)) # the folders that contain fastQC
fastq.libs <- data.frame(Libraries = gsub(".*/(L.*)\\.fq_fastqc", "\\1", fq_fastqc.folders, perl=T))
xtable(fastq.libs)
@
\chapter{FastQC}
<<loop_n_plots, echo=FALSE, results='asis'>>=
## do the plotting
sapply(fq_fastqc.folders[1:3], ReturnStatsPlotsAndTables)
@
\end{document}
ReturnStatsPlotsAndTables 函数实际上更长,这足以让您了解正在发生的事情。