r - ggplot中的李克特堆积条形图与前后测试

Question

我是 R 和老师的新手，所以感谢您的耐心等待。我在李克特堆积条形图上搜索了许多其他问题（这个很接近，但不完全是我正在努力解决的问题）。我似乎找不到一个讨论如何将调查前和后测试的结果拉到同一个堆叠条形图中的内容。我已经阅读了 Hayley 的 R for Data Science 一书、GitHub 上的示例、R Companion Handbook 和 R Cookbook。作为初学者，仍然确实需要一些帮助。

我有一组 12 个学生问题，每个问题都有一个测试前和测试后的回答，大小为“非常同意”到“非常不同意”。

我的问题是：学生调查问卷在考试前后有何变化？

我的数据最初显示为：

Student sex(F=0,M=1)  PreTestQ1   PostTestQ1
1       0              Agree      Disagree
2       0              Disagree   Agree
3       1              Agree      Agree
4       1              Disagree   Agree

首先，我将同意/不同意转换为数值数据（非常同意 = 1，非常不同意 = 4，没有中性选项），并使用以下方法从宽到长整理数据：

    # Set data frame as wide
msse_wide <- read_xls("ProcessDataMSSE.xls")
colnames(msse_wide) # Displays names of columns
head(msse_wide)


# Set data frame as long, after running wide code above
msse_long <- msse_wide %>%
  gather(question,obs_prepost, c(2:25)) # This pulls the columns from 2 to 25 (not including the "sex" column), test it out first as a precaution

# NOW MY DATA IS TIDY!!!! :)

我得到了：

    > msse_long
# A tibble: 1,824 x 3
     sex question obs_prepost
   <dbl> <chr>          <dbl>
 1     0 1Pre               3
 2     0 1Pre               3
 3     0 1Pre               2
 4     0 1Pre               3
 5     0 1Pre               2
 6     0 1Pre               3
 7     0 1Pre               3
 8     0 1Pre               2
 9     0 1Pre               2
10     0 1Pre               4
# … with 1,814 more rows

现在我想将强烈同意 --- 强烈不同意回答的百分比可视化为堆叠条形图，使用百分比响应，并将测试前和测试后作为堆叠条进行比较（因此，有 12 个问题前后，我将有 24 个堆叠条形图）。

最终目标类似于 R Companion: Simple Stacked Bar Chart中的这个例子......除了我被困在如何从我的数据中提取百分比，并在另一个之上比较前测试和后测试。

score 3 · Accepted Answer

像这样的东西？：

数据：

msse_wide <- read.table(text='Student sex(F=0,M=1)  PreTestQ1   PostTestQ1
                              1       0              Agree      Disagree
                              2       0              Disagree   Agree
                              3       1              Agree      Agree
                              4       1              Disagree   Agree',
                        header=TRUE,
                        stringsAsFactors=FALSE)

建议使用dplyr,tidyr ggplot2和的解决方案scales：

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
msse_wide %>% 
  pivot_longer(cols = -c(Student, sex.F.0.M.1.),
               names_to = "Test") %>% 
  group_by(Test, value) %>% 
  summarise(N = n()) %>%
  mutate(Pct = N / sum(N)) %>% 
  ggplot(aes(Test, Pct, fill = value)) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels = percent)

编辑：

感谢@dc37 的评论：

添加

+ coord_flip()

上面的代码给出：

解释：

从我们使用,的后继形式的数据开始，以获得所需的结构。pivot_longertidyrgather

然后我们按Test和（个人答案级别）分组，并使用 dplyr 的函数value通过计算每组中的案例进行总结。n

然后我们改变（在这种情况下，创建）一个列，我们将每个组合的计数除以每个组Test的value总和Test（dplyr 现在仅按第一组分组，Test）

最后，我们使用ggplot2来绘制数据并scales标记百分比轴。

score -1 · Accepted Answer

我有一些似乎有效的东西，只是前后处理出了问题。这是在基础 R 中。感谢您的帮助！

if(!require(psych)){install.packages("psych")}
if(!require(likert)){install.packages("likert")}
library(readxl)
setwd("MSSE 507 Capstone Data Analysis/")
read_xls("ProcessDataMSSE.xls")


Data = read_xls("ProcessDataMSSE.xls")

str(Data) # tbl_df, tbl, and data.frame classes

### Change Likert scores to factor and specify levels; factors because numeric values are ordinal

Data <- Data[, c(3:26)] # Get rid of the other columns! (Drop multiple columns) 

Data$`1Pre` <- factor(Data$`1Pre`,
                   levels = c("1", "2", "3", "4"),
                   ordered = TRUE)

Data$`1Post` = factor(Data$`1Post`,
                     levels = c("1", "2", "3", "4"),
                     ordered = TRUE)

Data$`2Pre` <- factor(Data$`2Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`2Post` = factor(Data$`2Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`3Pre` <- factor(Data$`3Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`3Post` = factor(Data$`3Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`4Pre` <- factor(Data$`4Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`4Post` = factor(Data$`4Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`5Pre` <- factor(Data$`5Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`5Post` = factor(Data$`5Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`6Pre` <- factor(Data$`6Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`6Post` = factor(Data$`6Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`7Pre` <- factor(Data$`7Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`7Post` = factor(Data$`7Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`8Pre` <- factor(Data$`8Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`8Post` = factor(Data$`8Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`9Pre` <- factor(Data$`9Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`9Post` = factor(Data$`9Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`10Pre` <- factor(Data$`10Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`10Post` = factor(Data$`10Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`11Pre` <- factor(Data$`11Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`11Post` = factor(Data$`11Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`12Pre` <- factor(Data$`12Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`12Post` = factor(Data$`12Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data <- factor(Data,levels=Data[3:26])
Data
### Double check the data frame

library(psych) # Loads psych package

headTail(Data) # Displays last few and first few data

str(Data) # Shows structure of an object (observations and variables, etc.) - in this case, ordinal factors with 4 levels (1 through 4)

summary(Data) # Summary of the number of times you see a data point

Data$`1Pre` # This allows us to check how many data points are really there

str(Data)
### Remove unnecessary objects, removing the data frame in this case (we've converted that data frame into a table with the read.table function above)

library(likert)

Data <- as.data.frame(Data) # Makes the tibble a data frame

likert(Data) # This will give the percentage responses for each level and group

Result = likert(Data)

summary(Result) # This will give the mean and SD 


plot(Result,
     main = "Pre and Post Treatment Percentage Responses",
     ylab="Questions",
     type="bar")

r - ggplot中的李克特堆积条形图与前后测试

2 回答 2

编辑：

Related

Reference