0

我希望有人可以帮助我在此页面上重现“威尔斯功能百分比:0 < amount_tsh <= 10000”的情节:https ://rstudio-pubs-static.s3.amazonaws.com/339668_006f4906390e41cea23b3b786cc0230a.html ,

为了帮助找到它,它出现在文本下方:

“amount_tsh 值在 (1:10000) 范围内的条形图表明,具有相对较大值的泵可能比具有较小值的泵更有可能发挥作用。但是,由于较大的值可能是异常值,我们缺乏有效的 amount_tsh数据集中超过 70% 的记录的值,我们似乎极不可能从图中得出任何有效的预测推论。”

以下代码设置项目:

train_values_url <- "http://s3.amazonaws.com/drivendata/data/7/public/4910797b-ee55-40a7-8668-10efd5c1b960.csv"

# Import train_values

train_values <- read.csv(train_values_url)

# Define train_labels_url
train_labels_url <- "http://s3.amazonaws.com/drivendata/data/7/public/0bf8bc6e-30d0-4c50-956a-603fc693d966.csv"

# Import train_labels
train_labels <- read.csv(train_labels_url)

# Define test_values_url
test_values_url <- "http://s3.amazonaws.com/drivendata/data/7/public/702ddfc5-68cd-4d1d-a0de-f5f566f76d91.csv"

# Import test_values
test_values <- read.csv(test_values_url)

# Merge data frames to create the data frame train
train <- merge(train_labels, train_values)
test <- test_values
rm(train_labels)  
rm(train_values)
rm(test_values)
rm(train_labels_url)
rm(train_values_url)
rm(test_values_url)

您可以看到功能井的数量:

table(train$status_group)

# As proportions
prop.table(table(train$status_group))

为了生成绘图,我尝试了以下代码,期望状态组的比例按 500 geom_col 宽度“分箱”但没有成功:

train %>% 
filter(amount_tsh>1000 & amount_tsh<=10000) %>%
ggplot(aes(x=amount_tsh,y=sum(status_group=="functional")/sum(status_group %in% c("functional", "non functional", "functional needs repair")))) +
geom_col(width=500)

4

0 回答 0