我正在尝试对一些行/变量(分类和连续)进行分组,以帮助提高大型数据集中的表格可读性。
这是虚拟数据集:
library(gtsummary)
library(tidyverse)
library(gt)
set.seed(11012021)
# Create Dataset
PIR <-
tibble(
siteidn = sample(c("1324", "1329", "1333", "1334"), 5000, replace = TRUE, prob = c(0.2, 0.45, 0.15, 0.2)) %>% factor(),
countryname = sample(c("NZ", "Australia"), 5000, replace = TRUE, prob = c(0.3, 0.7)) %>% factor(),
hospt = sample(c("Metropolitan", "Rural"), 5000, replace = TRUE, prob = c(0.65, 0.35)) %>% factor(),
age = rnorm(5000, mean = 60, sd = 20),
apache2 = rnorm(5000, mean = 18.5, sd=10),
apache3 = rnorm(5000, mean = 55, sd=20),
mechvent = sample(c("Yes", "No"), 5000, replace = TRUE, prob = c(0.4, 0.6)) %>% factor(),
sex = sample(c("Female", "Male"), 5000, replace = TRUE) %>% factor(),
patient = TRUE
) %>%
mutate(patient_id = row_number())%>%
group_by(
siteidn) %>% mutate(
count_site = row_number() == 1L) %>%
ungroup()%>%
group_by(
patient_id) %>% mutate(
count_pt = row_number() == 1L) %>%
ungroup()
然后我使用以下代码生成我的表:
t1 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, countryname) %>%
tbl_summary(
by = countryname,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**") %>%
add_overall(col_label = "**Overall**")
t2 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, hospt) %>%
tbl_summary(
by = hospt,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**")
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA)
这将产生下表:
我想将某些行组合在一起以便于阅读。理想情况下,我希望表格如下所示:
我尝试使用 gt 包,代码如下:
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA) %>%
as_gt() %>%
gt::tab_row_group(
group = "Severity of Illness Scores",
rows = 7:8) %>%
gt::tab_row_group(
group = "Patient Demographics",
rows = 3:6) %>%
gt::tab_row_group(
group = "Numbers",
rows = 1:2)
这将产生所需的表:
我这样做的方式有几个问题。
当我尝试使用行名(变量)时,会出现一条错误消息(不能对不存在的列进行子集...)。有没有办法通过使用变量名来做到这一点?对于较大的表,我在使用分配行名的行号方法时遇到了一些麻烦。当有一个变量在移动到末尾以考虑分组行时丢失其位置时尤其如此。
在管道到 tbl_summary 之前有没有办法做到这一点?虽然我喜欢这个表格的输出,但我使用 Word 作为统计报告的输出文档,并且希望能够在需要时(或由我的合作者)在 Word 中格式化表格。我通常使用 gtsummary::as_flextable 进行表格输出。
再次感谢,
本