我有这个数据集
study_ID title experiment question_ID participant_ID estimate_level estimate correct_answer question type category age gender
<dbl> <chr> <dbl> <chr> <int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <chr>
1 11 Dallacker_Parents'_co… 1 1 1 individual 3 10 How many sugar cubes does or… unlim… nutriti… 32 Female
2 11 Dallacker_Parents'_co… 1 2 1 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 32 Female
3 11 Dallacker_Parents'_co… 1 3 1 individual 7 6.5 How many sugar cubes does a … unlim… nutriti… 32 Female
4 11 Dallacker_Parents'_co… 1 4 1 individual 1 16.5 How many sugar cubes does a … unlim… nutriti… 32 Female
5 11 Dallacker_Parents'_co… 1 5 1 individual 7 11 How many sugar cubes does a … unlim… nutriti… 32 Female
6 11 Dallacker_Parents'_co… 1 6 1 individual 5 2.5 How many sugar cubes does a … unlim… nutriti… 32 Female
7 11 Dallacker_Parents'_co… 1 1 2 individual 2 10 How many sugar cubes does or… unlim… nutriti… 29 Female
8 11 Dallacker_Parents'_co… 1 2 2 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 29 Female
9 11 Dallacker_Parents'_co… 1 3 2 individual 1.5 6.5 How many sugar cubes does a … unlim… nutriti… 29 Female
10 11 Dallacker_Parents'_co… 1 4 2 individual 2 16.5 How many sugar cubes does a … unlim… nutriti… 29 Female
这个数据集中有6道题,每道题都有一correct_answer
栏,一estimate
栏。我正在尝试计算每个问题的量级,以便获得低估或高估以及正确估计的人的百分比。
例如,对于 6 个问题中的每一个,它都会返回如下内容:80% 被低估,10% 被高估,10% 回答正确。
我怎样才能做到这一点?我难住了。提前致谢!
这是输入
dput(head(DF, 10))
structure(list(study_ID = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), title = c("5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd"), experiment = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1), question_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
participant_ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), estimate_level = c("individual",
"individual", "individual", "individual", "individual", "individual",
"individual", "individual", "individual", "individual"),
estimate = c(2e+07, 4500000, 21075541, 2e+07, 1e+06, 1.1e+07,
2.5e+07, 8e+06, 1.6e+07, 9800000), correct = c(3.8e+07, 3.8e+07,
3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07,
3.8e+07), question = c("What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?"),
type = c("unlimited", "unlimited", "unlimited", "unlimited",
"unlimited", "unlimited", "unlimited", "unlimited", "unlimited",
"unlimited"), category = c("demographics", "demographics",
"demographics", "demographics", "demographics", "demographics",
"demographics", "demographics", "demographics", "demographics"
), age = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA"), gender = c("NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))