0

我的数据集是 Churn_Modeling:Churn_Modeling

我希望创建一个名为 c_rating 的列,其范围如下:(<500 -="very Poor"、500-600="poor"、601-660="fair"、661-780="good" 和 > = 780 – “优秀”)。

library(tidyverse)
library(reticulate)
library(readxl)
library(modelr)
library(ggplot2)
library(dplyr)
churn <- read.csv("Churn_Modeling.csv")
churn$CreditScore <- as.numeric(churn$CreditScore)
class(churn$CreditScore)
churn$c_rating <- cut(churn$CreditScore, c(-Inf, 500, 600, 601, 660, 661, 780, Inf),
                      levels=c('<=500', '500-600', '601-660', '661-780', '>780'))

churn$c_rating

我的输出没有像我想的那样创建列 c_rating。有任何想法吗?

4

2 回答 2

1

使用 mutate() 和 case_when()。

library(tidyverse)

churn <- read.csv("Churn_Modeling.csv")
churn<-churn %>% mutate(c_rating=case_when(CreditScore<500~"very poor", 
                                           CreditScore>=500 & CreditScore<=600~"poor", 
                                           CreditScore>=601 & CreditScore<=660~"fair", 
                                           CreditScore>=661 & CreditScore<=780~"good", 
                                           CreditScore> 780 ~ "excellent"))
于 2020-12-11T21:28:19.107 回答
0

Nicolas Ratto 的回答非常好。另一种方法是首先创建一个用户定义的函数,然后使用lapply(). 这是一个例子。

churn <- read.csv("Churn_Modeling.csv")

churn$CreditScore <- as.numeric(churn$CreditScore)

C_Rating = function(score){
  if (score < 500) 
    rating = "Very Poor"
  else if (score >= 500 & score <= 600)
    rating = "Poor"
  else if (score >= 601 & score <= 660)
    rating = "Fair"
  else if(score >= 661 & score <= 780)
    rating = "Good"
  else
    rating = "Excellent"
  
  return(rating)

}

churn$c_rating = churn$CreditScore %>% lapply(C_Rating)
于 2020-12-11T21:42:51.843 回答