我想帮助你。但是,您提供的数据存在以下问题:
- 变量
COND
丢失
- 变量只有一个唯一值
TASK
( CreateTableOne
函数不接受具有唯一值的变量)。
- 变量只有一个唯一值
age
。
- 变量
ID
重复多次。
但是,即使不更改数据,您也可以看到问题所在。如果您有此表格中的数据,则不能使用CreateTableOne
!这是因为它计算 value 的m
每次出现和 value 的每次出现 k
。而且由于您有一个人的多个条目,因此该CreateTableOne
函数将分别计算每个出现的次数。
请看一下我在这里提出的解决方案如何描述几个变量的分组观察的唯一值?.
更新 1
好的。让我们尝试面对您的数据。您有 54 名具有不同 ID 的患者。
data_Confidence_in_Action %>% distinct(ID) %>% nrow()
#[1] 54
但是,请注意,一个 ID 似乎不正确。
data_Confidence_in_Action %>% distinct(ID) %>%
mutate(lenID = str_length(ID)) %>% filter(lenID!=5)
# A tibble: 1 x 2
# ID lenID
# <chr> <int>
#1 P1419 dots 10
但是,我们可以保持原样。如果需要,请自行更正。但是,请记住,您有多达 8 种不同的性别。要小心,因为在我们国家,性别意识形态不受欢迎;-)
data_Confidence_in_Action %>% distinct(gender)
# A tibble: 8 x 1
# gender
# <chr>
#1 k
#2 kobieta
#3 M
#4 K
#5 m¦Ö+-czyzna
#6 21
#7 m
#8 M¦Ö+-czyzna
不幸的是,这需要修复。不幸的是,患者 P1440 按性别分配了年龄。那么P1440的性别是什么?
data_Confidence_in_Action %>% filter(gender==21) %>% distinct(ID, gender, age)
# A tibble: 1 x 3
# ID gender age
# <chr> <chr> <dbl>
#1 P1440 21 NA
data_Confidence_in_Action %>% distinct(ID, gender) %>%
group_by(gender) %>% summarise(n = n())
# A tibble: 8 x 2
# gender n
# <chr> <int>
#1 21 1
#2 k 36
#3 K 3
#4 kobieta 9
#5 m 1
#6 M 1
#7 m¦Ö+-czyzna 2
#8 M¦Ö+-czyzna 1
正如你所看到的,你有更多的女人。所以让P1440成为一个女人。会好的?
最后,请注意这两个变量的名称不方便。它大约是Condition (whether a person responded)
和Go / Nogo (whether a person should respond)
。
让我们一口气解决所有问题。
data_Confidence_in_Action = data_Confidence_in_Action %>%
mutate(
gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
age = ifelse(is.na(age), 21, age)
) %>% rename(Condition=`Condition (whether a person responded)`,
Go.Nogo = `Go/Nogo (whether a person should respond)`)
最后,让我们将一些变量从 更改chr
为factor
,但不要替换正确的级别。我希望我明智地接受它。
data_Confidence_in_Action = data_Confidence_in_Action %>%
mutate(
ID = ID %>% fct_inorder(),
gender = gender %>% fct_infreq(),
t1.key = t1.key %>% fct_infreq(),
Condition = Condition %>% fct_infreq(),
CR.key = CR.key %>% fct_infreq(),
TASK = TASK %>% fct_infreq(),
Go.Nogo = Go.Nogo %>% fct_infreq(),
difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
)
以这种方式组织数据后,让我们进入问题的核心。你真正想分析什么。请注意,对于、 和等变量TASK
,每个申请人都有两个有效值。 Condition
t1.key
data_Confidence_in_Action %>% group_by(ID) %>% summarise(
nunique.TASK = length(unique(TASK)),
nunique.Condition = length(unique(Condition)),
nunique.t1.key = length(unique(t1.key))
) %>% distinct(nunique.TASK, nunique.Condition, nunique.t1.key)
# A tibble: 1 x 3
# nunique.TASK nunique.Condition nunique.t1.key
# <int> <int> <int>
#1 2 2 2
但是,如果我们查看这些变量中不同值出现的比例,就会发现每个患者的情况都不同。
data_Confidence_in_Action %>% group_by(ID) %>% summarise(
prop.TASK = sum(TASK=="left")/sum(TASK=="right")) %>%
distinct()
data_Confidence_in_Action %>% group_by(ID) %>% summarise(
prop.Condition = sum(Condition=="NR")/sum(Condition=="R"))%>%
distinct()
data_Confidence_in_Action %>% group_by(ID) %>% summarise(
prop.t1.key = sum(t1.key=="None")/sum(t1.key=="space"))%>%
distinct()
所以写清楚你想总结什么以及如何总结,因为我不清楚你想得到什么。
更新 2
好的。我可以看到你开始明白一些事情了。不过,我不知道你想总结什么。往下看。首先,让我们收集所有代码以准备数据
library(tidyverse)
library(readxl)
library(tableone)
data_Confidence_in_Action <- read_excel("data_Confidence in Action.xlsx")
data_Confidence_in_Action = data_Confidence_in_Action %>%
mutate(
gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
age = ifelse(is.na(age), 21, age)
) %>% rename(Condition=`Condition (whether a person responded)`,
Go.Nogo = `Go/Nogo (whether a person should respond)`)
data_Confidence_in_Action = data_Confidence_in_Action %>%
mutate(
ID = ID %>% fct_inorder(),
gender = gender %>% fct_infreq(),
t1.key = t1.key %>% fct_infreq(),
Condition = Condition %>% fct_infreq(),
CR.key = CR.key %>% fct_infreq(),
TASK = TASK %>% fct_infreq(),
Go.Nogo = Go.Nogo %>% fct_infreq(),
difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
)
现在是总结。如果我们这样做:
CreateTableOne(
data = data_Confidence_in_Action,
vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'),
strata = 'gender',
factorVars = c('TASK', 'Condition', 't1.key'),
argsApprox = list(correct = FALSE),
smd = TRUE,
addOverall = TRUE,
test = TRUE) %>%
kableone()
输出
| |Overall |k |m |p |test |
|:-----------------------|:------------|:------------|:------------|:------|:----|
|n |41713 |37823 |3890 | | |
|TASK = right (%) |20832 (49.9) |18889 (49.9) |1943 (49.9) |0.992 | |
|Condition = R (%) |20033 (48.0) |18130 (47.9) |1903 (48.9) |0.241 | |
|t1.key = space (%) |20033 (48.0) |18130 (47.9) |1903 (48.9) |0.241 | |
|T1.response (mean (SD)) |0.48 (0.50) |0.48 (0.50) |0.49 (0.50) |0.241 | |
|age (mean (SD)) |20.74 (2.67) |20.75 (2.70) |20.60 (2.33) |0.001 | |
|T1.ACC (mean (SD)) |0.70 (0.46) |0.70 (0.46) |0.73 (0.45) |<0.001 | |
我们得到所有观察结果的摘要n == 41713
。而且由于对每个病人都有很多观察,所以这样的总结是没有多大用处的。至少我是这么认为的。但是,我们可以总结一些选定的患者。
CreateTableOne(
data = data_Confidence_in_Action %>%
filter(ID %in% c('P1323', 'P1403', 'P1404')) %>%
mutate(ID = ID %>% fct_drop()),
vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'),
strata = c('ID'),
factorVars = c('TASK', 'Condition', 't1.key'),
argsApprox = list(correct = FALSE),
smd = TRUE,
addOverall = TRUE,
test = TRUE) %>%
kableone()
输出
| |Overall |P1323 |P1403 |P1404 |p |test |
|:-----------------------|:------------|:------------|:------------|:------------|:------|:----|
|n |2323 |775 |776 |772 | | |
|TASK = right (%) |1164 (50.1) |390 (50.3) |386 (49.7) |388 (50.3) |0.969 | |
|Condition = R (%) |1168 (50.3) |385 (49.7) |435 (56.1) |348 (45.1) |<0.001 | |
|t1.key = space (%) |1168 (50.3) |385 (49.7) |435 (56.1) |348 (45.1) |<0.001 | |
|T1.response (mean (SD)) |0.50 (0.50) |0.50 (0.50) |0.56 (0.50) |0.45 (0.50) |<0.001 | |
|age (mean (SD)) |19.66 (0.94) |19.00 (0.00) |19.00 (0.00) |21.00 (0.00) |<0.001 | |
|T1.ACC (mean (SD)) |0.70 (0.46) |0.67 (0.47) |0.77 (0.42) |0.65 (0.48) |<0.001 | |
这现在更有意义,但对每个患者都是分开的。
或者,您可以在不使用 的情况下进行此摘要CreateTableOne
,例如 yes
data_Confidence_in_Action %>% group_by(gender, ID) %>%
summarise(
age = min(age)) %>% group_by(gender) %>%
summarise(
n = n(),
Min = min(age),
Q1 = quantile(age,1/4,8),
mean = mean(age),
median = median(age),
Q3 = quantile(age,3/4,8),
Max = max(age),
IQR = IQR(age),
Kurt = e1071::kurtosis(age),
skew = e1071::skewness(age),
SD = sd(age))
输出
# A tibble: 2 x 12
gender n Min Q1 mean median Q3 Max IQR Kurt skew SD
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 k 49 19 19 20.8 20 21 32 2 7.47 2.79 2.73
2 m 5 19 19 20.6 19 21 25 2 -1.29 0.823 2.61
仔细考虑并写下您真正期望的内容。当然,除非这个话题对你来说仍然很有趣。