r - R：如何修改 CreateTable() 函数以进行重复观察并使用错误索引？

Question

我正在尝试在以下数据集上创建一个表，我在这里报告前 50 个观察结果。下面报告了我正在处理的数据集。

我建议将 age 和 gnder 变量的一些拼写错误修复如下：

colnames(d)[8] <- 'COND'
d$gender = ifelse(tolower(substr(d$gender,1,1)) == "k", "F", "M") 
library(libr)
d <- datastep(d, {
  if (is.na(age)) {
    age <- 21
  }}
)

我正在尝试使用以下代码创建一个汇总表：

CreateTableOne(
  vars = c('TASK', 'COND', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = c('ID'),
  factorVars = c('gender'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  na.omit() %>% 
  kableone()

获得这张表

但是，您如何从这个函数中看到，因为我对同一主题有很多观察，我只计算了 54 个 ID，因此女性和男性的数量是不正确的。

length(unique(d$ID)) 
[1] 54

任何人都知道如何解决它？此外，由于“年龄”和“T1.ACC”具有非正态分布，任何人都知道我如何用中位数和 Q1 和 Q3 替换它们，例如？

score 1 · Accepted Answer

我想帮助你。但是，您提供的数据存在以下问题：

变量COND丢失
变量只有一个唯一值TASK（ CreateTableOne函数不接受具有唯一值的变量）。
变量只有一个唯一值age。
变量ID重复多次。

但是，即使不更改数据，您也可以看到问题所在。如果您有此表格中的数据，则不能使用CreateTableOne！这是因为它计算 value 的m每次出现和 value 的每次出现 k。而且由于您有一个人的多个条目，因此该CreateTableOne函数将分别计算每个出现的次数。

请看一下我在这里提出的解决方案如何描述几个变量的分组观察的唯一值？.

更新 1

好的。让我们尝试面对您的数据。您有 54 名具有不同 ID 的患者。

data_Confidence_in_Action %>% distinct(ID) %>% nrow()
#[1] 54

但是，请注意，一个 ID 似乎不正确。

data_Confidence_in_Action %>% distinct(ID) %>%
  mutate(lenID = str_length(ID)) %>% filter(lenID!=5)
#  A tibble: 1 x 2
#  ID         lenID
#  <chr>      <int>
#1 P1419 dots    10

但是，我们可以保持原样。如果需要，请自行更正。但是，请记住，您有多达 8 种不同的性别。要小心，因为在我们国家，性别意识形态不受欢迎；-)

data_Confidence_in_Action %>% distinct(gender)
#  A tibble: 8 x 1
#  gender     
#  <chr>      
#1 k          
#2 kobieta    
#3 M          
#4 K          
#5 m¦Ö+-czyzna
#6 21         
#7 m          
#8 M¦Ö+-czyzna

不幸的是，这需要修复。不幸的是，患者 P1440 按性别分配了年龄。那么P1440的性别是什么？

data_Confidence_in_Action %>% filter(gender==21) %>% distinct(ID, gender, age)
#  A tibble: 1 x 3
#  ID    gender   age
#  <chr> <chr>  <dbl>
#1 P1440 21        NA

data_Confidence_in_Action %>% distinct(ID, gender) %>% 
  group_by(gender) %>% summarise(n = n())
#  A tibble: 8 x 2
#  gender          n
#  <chr>       <int>
#1 21              1
#2 k              36
#3 K               3
#4 kobieta         9
#5 m               1
#6 M               1
#7 m¦Ö+-czyzna     2
#8 M¦Ö+-czyzna     1

正如你所看到的，你有更多的女人。所以让P1440成为一个女人。会好的？

最后，请注意这两个变量的名称不方便。它大约是Condition (whether a person responded)和Go / Nogo (whether a person should respond)。

让我们一口气解决所有问题。

data_Confidence_in_Action = data_Confidence_in_Action %>% 
  mutate(
    gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
    age = ifelse(is.na(age), 21, age)
  ) %>% rename(Condition=`Condition (whether a person responded)`, 
               Go.Nogo = `Go/Nogo (whether a person should respond)`)

最后，让我们将一些变量从更改chr为factor，但不要替换正确的级别。我希望我明智地接受它。

data_Confidence_in_Action = data_Confidence_in_Action %>% 
  mutate(
    ID = ID %>% fct_inorder(),
    gender = gender %>% fct_infreq(),
    t1.key = t1.key %>% fct_infreq(),
    Condition = Condition %>% fct_infreq(),
    CR.key = CR.key %>% fct_infreq(),
    TASK = TASK %>% fct_infreq(),
    Go.Nogo = Go.Nogo %>% fct_infreq(),
    difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
  )

以这种方式组织数据后，让我们进入问题的核心。你真正想分析什么。请注意，对于、和等变量TASK，每个申请人都有两个有效值。 Conditiont1.key

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  nunique.TASK = length(unique(TASK)),
  nunique.Condition = length(unique(Condition)),
  nunique.t1.key = length(unique(t1.key))
) %>% distinct(nunique.TASK, nunique.Condition, nunique.t1.key)
#  A tibble: 1 x 3
#  nunique.TASK nunique.Condition nunique.t1.key
#         <int>             <int>          <int>
#1            2                 2              2

但是，如果我们查看这些变量中不同值出现的比例，就会发现每个患者的情况都不同。

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.TASK = sum(TASK=="left")/sum(TASK=="right")) %>% 
  distinct()

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.Condition = sum(Condition=="NR")/sum(Condition=="R"))%>% 
  distinct()

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.t1.key = sum(t1.key=="None")/sum(t1.key=="space"))%>% 
  distinct()

所以写清楚你想总结什么以及如何总结，因为我不清楚你想得到什么。

更新 2

好的。我可以看到你开始明白一些事情了。不过，我不知道你想总结什么。往下看。首先，让我们收集所有代码以准备数据

library(tidyverse)
library(readxl)
library(tableone)
data_Confidence_in_Action <- read_excel("data_Confidence in Action.xlsx")

data_Confidence_in_Action = data_Confidence_in_Action %>%
  mutate(
    gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
    age = ifelse(is.na(age), 21, age)
  ) %>% rename(Condition=`Condition (whether a person responded)`,
               Go.Nogo = `Go/Nogo (whether a person should respond)`)

data_Confidence_in_Action = data_Confidence_in_Action %>%
  mutate(
    ID = ID %>% fct_inorder(),
    gender = gender %>% fct_infreq(),
    t1.key = t1.key %>% fct_infreq(),
    Condition = Condition %>% fct_infreq(),
    CR.key = CR.key %>% fct_infreq(),
    TASK = TASK %>% fct_infreq(),
    Go.Nogo = Go.Nogo %>% fct_infreq(),
    difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
  )

现在是总结。如果我们这样做：

CreateTableOne(
  data = data_Confidence_in_Action,
  vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = 'gender',
  factorVars = c('TASK', 'Condition', 't1.key'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  kableone()

输出

|                        |Overall      |k            |m            |p      |test |
|:-----------------------|:------------|:------------|:------------|:------|:----|
|n                       |41713        |37823        |3890         |       |     |
|TASK = right (%)        |20832 (49.9) |18889 (49.9) |1943 (49.9)  |0.992  |     |
|Condition = R (%)       |20033 (48.0) |18130 (47.9) |1903 (48.9)  |0.241  |     |
|t1.key = space (%)      |20033 (48.0) |18130 (47.9) |1903 (48.9)  |0.241  |     |
|T1.response (mean (SD)) |0.48 (0.50)  |0.48 (0.50)  |0.49 (0.50)  |0.241  |     |
|age (mean (SD))         |20.74 (2.67) |20.75 (2.70) |20.60 (2.33) |0.001  |     |
|T1.ACC (mean (SD))      |0.70 (0.46)  |0.70 (0.46)  |0.73 (0.45)  |<0.001 |     |

我们得到所有观察结果的摘要n == 41713。而且由于对每个病人都有很多观察，所以这样的总结是没有多大用处的。至少我是这么认为的。但是，我们可以总结一些选定的患者。

CreateTableOne(
  data = data_Confidence_in_Action %>% 
    filter(ID %in% c('P1323', 'P1403', 'P1404')) %>% 
    mutate(ID = ID %>% fct_drop()),
  vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = c('ID'),
  factorVars = c('TASK', 'Condition', 't1.key'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  kableone()

输出

|                        |Overall      |P1323        |P1403        |P1404        |p      |test |
|:-----------------------|:------------|:------------|:------------|:------------|:------|:----|
|n                       |2323         |775          |776          |772          |       |     |
|TASK = right (%)        |1164 (50.1)  |390 (50.3)   |386 (49.7)   |388 (50.3)   |0.969  |     |
|Condition = R (%)       |1168 (50.3)  |385 (49.7)   |435 (56.1)   |348 (45.1)   |<0.001 |     |
|t1.key = space (%)      |1168 (50.3)  |385 (49.7)   |435 (56.1)   |348 (45.1)   |<0.001 |     |
|T1.response (mean (SD)) |0.50 (0.50)  |0.50 (0.50)  |0.56 (0.50)  |0.45 (0.50)  |<0.001 |     |
|age (mean (SD))         |19.66 (0.94) |19.00 (0.00) |19.00 (0.00) |21.00 (0.00) |<0.001 |     |
|T1.ACC (mean (SD))      |0.70 (0.46)  |0.67 (0.47)  |0.77 (0.42)  |0.65 (0.48)  |<0.001 |     |

这现在更有意义，但对每个患者都是分开的。

或者，您可以在不使用的情况下进行此摘要CreateTableOne，例如 yes

data_Confidence_in_Action %>% group_by(gender, ID) %>% 
  summarise(
    age = min(age)) %>% group_by(gender) %>% 
  summarise(
    n = n(),
    Min = min(age),
    Q1 = quantile(age,1/4,8),
    mean = mean(age),
    median = median(age),
    Q3 = quantile(age,3/4,8),
    Max = max(age),
    IQR = IQR(age),
    Kurt = e1071::kurtosis(age),
    skew = e1071::skewness(age),
    SD = sd(age))

输出

# A tibble: 2 x 12
  gender     n   Min    Q1  mean median    Q3   Max   IQR  Kurt  skew    SD
  <fct>  <int> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 k         49    19    19  20.8     20    21    32     2  7.47 2.79   2.73
2 m          5    19    19  20.6     19    21    25     2 -1.29 0.823  2.61

仔细考虑并写下您真正期望的内容。当然，除非这个话题对你来说仍然很有趣。

r - R：如何修改 CreateTable() 函数以进行重复观察并使用错误索引？

1 回答 1

Related

Reference