第一步可能是将您的宽数据转换为长格式(年龄,性别,然后一列用于问题类型,一列用于该问题的答案)。使用这种长格式或整齐的数据,您可以轻松按问题、年龄和性别分组,并计算每个答案的比例。
代码
library(tidyverse)
df %>%
pivot_longer(cols = -c(Sex, `Age Group`),
names_to = "Question",
values_to = "Value") %>%
group_by(Question, Sex, `Age Group`) %>%
summarise(`Strongly Agree` = sum(Value == 7)/n(),
`Slightly Agree` = sum(Value == 6)/n(),
Agree = sum(Value == 5)/n(),
Neutral = sum(Value == 4)/n(),
Disagree = sum(Value == 3)/n(),
`Slightly Disagree` = sum(Value == 2)/n(),
`Strongly Disagree` = sum(Value == 1)/n())
输出
# A tibble: 16 x 10
# Groups: Question, Sex [8]
Question Sex `Age Group` `Strongly Agree` `Slightly Agree` Agree Neutral Disagree `Slightly Disagree` `Strongly Disagree`
<chr> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Q31 1 30-39 0 0 0 0 0 0 1
2 Q31 1 40-49 0 0 0 1 0 0 0
3 Q31 2 30-39 0 0 0 0 1 0 0
4 Q31 2 40-49 0 0 0 0 0 1 0
注意:在您的示例表 2 中,每个性别 x 年龄组合存在一次,因此您示例的比例为 0 或 1。
数据
df <- structure(list(Sex = c(1L, 2L, 1L, 2L), `Age Group` = structure(c(1L,
1L, 2L, 2L), .Label = c("30-39", "40-49"), class = "factor"),
Q31 = c(1L, 3L, 4L, 2L), Q32 = c(7L, 5L, 6L, 2L), Q33 = 1:4,
Q34 = c(5L, 6L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-4L))