0

我有一个名为“control.scores”的 tbl_df(tibble),它有一个名为“Overall”的列,它是介于 1.00 和 4.00 之间的某个值。

# A tibble: 2 x 8
      group   GOV  CORC  TMSC     AUDIT   PPS     TRAIN  Overall
      <chr> <dbl> <dbl> <dbl>     <dbl> <dbl>     <dbl>    <dbl>
1 Group4  0.82   0.2 0.525 0.2833333   0.2 0.2333333 2.261667
2 Group5  0.82   0.0 0.525 0.2833333   0.2 0.2333333 2.061667

我还有另一个 tbl_df,称为 'control.rating.tbl' 构造:

# create a reference table of control ratings and numeric ranges
control.ref.tbl <- tribble(
  ~RATING, ~MIN, ~MAX,
  "Ineffective", 3.500, 4.00,
  "Marginally Effective",2.500 ,3.499,
  "Generally Effective", 1.500 ,2.499,
  "Highly Effective", 1.00, 1.499
)

如何在“control.scores”中再追加一列,该列使用“Overall”中的值并检查其在“control.rating.tbl”的 MIN 和 MAX 范围之间的位置并返回对应的字符串?

例如,Group4_Overall == '2.261667,对应于 'control.rating.tbl' 中的 'Generally Effective'。它看起来像这样:

# A tibble: 2 x 8
      group   GOV  CORC  TMSC     AUDIT   PPS     TRAIN  Overall  Rating
      <chr> <dbl> <dbl> <dbl>     <dbl> <dbl>     <dbl>    <dbl>
1 Group4  0.82   0.2 0.525 0.2833333   0.2 0.2333333 2.261667  Generally Effective
2 Group5  0.82   0.0 0.525 0.2833333   0.2 0.2333333 2.061667  Generally Effective
4

1 回答 1

2

我们可以考虑使用case_whenfrom dplyr。请注意,我稍微更改了您的分类范围,因为您的原始分类存在差距。例如,3.505 将没有任何基于您的原始分类的关联类。dt2是最终的输出。

library(dplyr)

dt2 <- dt %>%
  mutate(Rating = case_when(
    Overall > 3.5 & Overall <= 4.00  ~ "Ineffective",
    Overall > 3 & Overall <= 3.5     ~ "Marginally Effective",
    Overall > 2.5 & Overall <= 3     ~ "Generally Effective",
    Overall >= 1 & Overall <= 2.5    ~ "Highly Effective"
  ))

数据:

dt <- read.table(text = "group   GOV  CORC  TMSC     AUDIT   PPS     TRAIN  Overall
1 Group4  0.82   0.2 0.525 0.2833333   0.2 0.2333333 2.261667
                 2 Group5  0.82   0.0 0.525 0.2833333   0.2 0.2333333 2.061667",
                 header = TRUE, stringsAsFactors = FALSE)
于 2017-08-23T06:03:53.940 回答