我有几所学校的学生数据。我想使用 R 显示每所学校通过测试的所有学生百分比的直方图。我的数据如下所示(id、学校、通过/失败):
432342 school1 通过
454233 school2 失败
543245 school1 失败
ETC'
(重点是我只对通过的学生百分比感兴趣,显然那些没有通过的学生不及格。我希望每所学校都有一个栏目,显示该学校通过的学生百分比)
谢谢
有很多方法可以做到这一点。一个是:
df<-data.frame(ID=sample(100),
school=factor(sample(3,100,TRUE),labels=c("School1","School2","School3")),
result=factor(sample(2,100,TRUE),labels=c("passed","failed")))
p<-aggregate(df$result=="passed"~school, mean, data=df)
barplot(p[,2]*100,names.arg=p[,1])
我之前的回答并没有完全成功。这是重做。示例是来自@eyjo 的答案。
students <- 400
schools <- 5
df <- data.frame(
id = 1:students,
school = sample(paste("school", 1:schools, sep = ""), size = students, replace = TRUE),
results = sample(c("passed", "failed"), size = students, replace = TRUE, prob = c(.8, .2)))
r <- aggregate(results ~ school, FUN = table, data = df)
r <- do.call(cbind, r) # "flatten" the result
r <- as.data.frame(cbind(r, sum = rowSums(r)))
r$perc.passed <- round(with(r, (passed/sum) * 100), 0)
library(ggplot2)
ggplot(r, aes(x = school, y = perc.passed)) +
theme_bw() +
geom_bar(stat = "identity")
由于您有个人记录(id)并且想根据索引(学校)进行计算,我建议tapply
这样做。
students <- 400
schools <- 5
df <- data.frame("id" = 1:students,
"school" = sample(paste("school", 1:schools, sep = ""),
size = students, replace = TRUE),
"results" = sample(c("passed", "failed"),
size = students, replace = TRUE, prob = c(.8, .2)))
p <- tapply(df$results == "passed", df$school, mean) * 100
barplot(p)