0

我试图根据三个因素(组、声音和语言)计算每个参与者的正确响应比例。我的数据框如下所示:

participant group   sound   lang    resp 
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       sp      0
advf03      adv     a       in      0
advf03      adv     a       sp      1
advf03      adv     a       sp      0
advf03      adv     a       in      1
advf03      adv     a       in      0
advf03      adv     a       in      1
begf03      beg     a       in      1
begf03      beg     a       in      1
begf03      beg     a       sp      0

“组”有 3 个级别:adv、int 和 beg。“声音”有 3 个级别:a、e、i。“Lang”有 2 个级别:in、sp。“1”表示正确响应,“0”表示不正确响应。我想将每个参与者的“1”的比例(即正确百分比)作为新数据框中的新列。我想要的信息类型示例:参与者 advf03 对“sp”中的“a”的正确率为 53%。

以下是我的数据中的 50 个观察结果:

structure(list(sound = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("a", 
"e", "i"), class = "factor"), resp = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), participant = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("2advf03", "2advf05", "2advm04", "2advm06", "2begf01", 
"2begf02", "2begf04", "2begf05", "2begm03", "2advf01", "2intf01", 
"2intf03", "2intf04", "2intf06", "2advm05"), class = "factor"), 
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("adv", 
"beg", "int"), class = "factor"), lang = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("in", "sp"), class = "factor")), .Names = c("sound", 
"resp", "participant", "group", "lang"), row.names = c(10L, 31L, 
36L, 43L, 47L, 49L, 52L, 59L, 61L, 65L, 66L, 68L, 71L, 79L, 97L, 
99L, 106L, 125L, 133L, 138L, 147L, 149L, 162L, 165L, 174L, 175L, 
33L, 37L, 112L, 136L, 154L, 186L, 11L, 50L, 89L, 92L, 104L, 105L, 
123L, 126L, 129L, 143L, 153L, 173L, 177L, 187L, 188L, 191L, 7L, 
12L), class = "data.frame")

这是我到目前为止所拥有的:

# get counts of subsets of factors
df <- as.data.frame(table(df))

# new column that gives the proportion of responses
df$prop <- df$Freq / 32

但这似乎并没有给我正确的比例。我知道我需要减少数据,这样我就没有太多的观察结果(即每个参与者的每种语言的每种声音都有 1 个值,但我不知道这样做的正确步骤。

4

1 回答 1

0

If I understand your question correctly, you would like to know the proportion of 1s by participant, sound, and language.

Because the proportion of 1s in a vector with only 0s and 1s is just the mean, this should work:

aggregate(data=df, resp ~ participant + group + lang, FUN="mean")

The output of that with your 50 observations is:

  participant group lang      resp
1     2advf03   adv   in 0.1875000
2     2advf03   adv   sp 0.1111111
于 2013-12-05T19:26:49.153 回答