这是使用dplyrR 包的方法:
library(dplyr)
# your data
df <- data.frame(
pair = c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4),
treatment = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1),
response = c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1))
# data management
df2 <- df %>%
group_by(pair) %>%
arrange(treatment) %>%
summarise_all(funs(toString(na.omit(.))))
df2
## A tibble: 5 x 3
# pair treatment response
# <dbl> <chr> <chr>
#1 0 0, 1 0, 1
#2 1 0, 1 1, 1
#3 2 0, 1 0, 1
#4 3 0, 1 0, 0
#5 4 0, 1 0, 1
# contingency table
df2 %>% summarise(
a = sum(response == '1, 1'), # count of pairs in which both members received a response
b = sum(response == '1, 0'), # count of pairs in which the control only received a response
c = sum(response == '0, 1'), # count of pairs in which the treatment only received a response
d = sum(response == '0, 0') # count of pairs in which neither members received a response
) %>% matrix(2,2)
# [,1] [,2]
#[1,] 1 3
#[2,] 0 1
说明:数据管理
这里的目标是用于summarise_all(funs(toString(na.omit(.))))折叠成对行中的响应值。这将允许您确定数据中有多少成对的 c(1, 1)、c(1, 0)、c(0, 1) 和 c(0, 0) 响应。
group_by(pair)使所有进一步的操作在pair组内完成。
arrange(treatment)根据treatment列(每组内pair)对行重新排序,以便控制响应和治疗响应的顺序对于每对始终保持相同的顺序 - 即,配对响应始终是控制第一,治疗第二。
summarise_all(funs(toString(na.omit(.))))将所有非 NA 元素(每个pair组内)连接到一行。
特别是因为group_by(pair)and summarise_all(...),每个标识符df2都有一行。pair
说明:列联表
在 内summarise(...),每个 TRUE 响应条件的计数被分配给它们各自的向量。列联表(矩阵)是根据计数创建的,matrix(c(a, b, c, d), 2, 2)组织方式相同。