编辑:这是与原始海报讨论问题后提供的修订答案。下面保留了一个不能解决手头问题的旧答案以供后代使用。
这个答案既不简短也不简洁,我希望有一种更清洁的方法。但以下将起作用:
## generate example data
set.seed(1)
death<-runif(1000)<=.75
ICU<-runif(1000)<=.63
serum<-runif(1000)<=.80
urine<-runif(1000)<=.77
brain<-runif(1000)<=.92
kidney<-runif(1000)<=.22
df<-as.data.frame(cbind((1:1000),death,ICU,serum,urine,brain,kidney))
## load up our data manipulation workhorses
library(reshape2)
library(plyr)
## save typing by saving row and column var names
row.vars <- c("serum", "urine", "brain", "kidney")
col.vars <- c("death", "ICU")
## melt data so we have death/icu in a column
dat.m <- melt(df, measure.vars = row.vars)
## get rid of rows with death==0 and ICU==0
dat.m <- dat.m[dat.m$value == 1, ]
## for each of death and icu calculate proportion of 1's
tab <- ddply(dat.m, "variable", function(DF) {
colwise(function(x) length(x[x==1]))(DF[col.vars])
})
## calculate overall proportions for row and column vars
row.nums <- sapply(df[row.vars], function(x) length(x[x==1]))
col.nums <- sapply(df[col.vars], function(x) length(x[x==1]))
## paste row and column counts into row and column names
rownames(tab) <- paste(tab$variable, " (N=", row.nums, ")", sep="")
tab$variable <- NULL
colnames(tab) <- paste(names(tab), " (N=", col.nums, ")", sep="")
## calculate cell proportions and paste them in one column at a time
tab[[1]] <- paste(tab[[1]],
" (",
round(100*(tab[[1]]/col.nums[[1]]), digits=2),
"%)",
sep="")
tab[[2]] <- paste(tab[[2]],
" (",
round(100*(tab[[2]]/col.nums[[2]]),
digits=2),
"%)",
sep="")
现在我们可以
## behold the fruits of our labor
tab
death (N=752) ICU (N=632)
serum (N=806) 602 (80.05%) 511 (80.85%)
urine (N=739) 556 (73.94%) 462 (73.1%)
brain (N=910) 684 (90.96%) 576 (91.14%)
kidney (N=190) 141 (18.75%) 128 (20.25%)
旧答案(不能解决手头的问题,但可能对相关任务有用)
这是看起来应该很容易的事情之一,但不知何故并非如此。
一旦您准备好将两列制成表格,就有一个现有问题可以解决这个问题。那部分很简单:
# function to genderate example data
mkdat <- function() factor(sample(letters[1:4], 10, replace=TRUE), levels=letters[1:4])
# make example data
set.seed(10)
dat <- data.frame(id = 1:10, var1 = mkdat(), var2=mkdat(), var3=mkdat())
# use reshape2 package to reshape from wide to long form
library(reshape2)
dat.m <- melt(dat, id.vars="id")
dat.m$value <- factor(dat.m$value)
现在交叉表dat.m$variable
并dat.m$value
给出正确的单元格。您可以参考上面的链接问题,了解如何从那里继续获取表格中的计数和百分比,或者您可以使用此方法:
# tabulate
library(plyr)
tab <- ddply(dat.m, "variable",
function(DF) {
# get counts with table
count <- table(DF$value)
# convert counts to percent
prop <- paste(prop.table(count)*100, "%", sep="")
# combine count and percent
cp <- paste(count, " (", prop, ")", sep="")
# re-attach the names
names(cp) <- levels(DF$value)
return(cp)
})
# get row n
tab.r <- table(dat.m$variable)
# get column n
tab.c <- table(dat.m$value)
# paste row and column n into row and column names
colnames(tab) <- paste(colnames(tab), " (n = ", tab.c, ")", sep="")
rownames(tab) <- paste(tab$variable, " (n = ", tab.r, ")", sep="")
tab$variable <- NULL
# works, but that was way too much effort.
print(tab)
必须承认,对于一个简单的计数和比例表来说,这是很多工作。如果有人提出一种更简单的方法,我会很高兴。