0

在新专栏中,我想指出每次合并更新缺失的记录。

目的:我有一个缺少分类代码的数据集。为了替换缺失值,我使用多个left_join/coalesce操作将 NA 替换为正确的代码。我想跟踪每次迭代期间更改了哪些值。

# DATA
df <- tibble(
x =  c(1, 2,  3, NA, NA), #<Original data
y = c( 1, NA, 3, 4, NA)   #<New data from join
)

# A tibble: 5 x 2
      x     y
  <dbl> <dbl>
1     1     1
2     2    NA
3     3     3
4    NA     4
5    NA    NA

我想看看...

# A tibble: 5 x 2
      x changed  
  <dbl> <chr>    
1     1 no.change
2     2 no.change
3     3 no.change
4     4 corrected
5    NA no.change
4

1 回答 1

1

你可能会使用case_when

library(tidyverse)
df %>% 
  mutate(new = coalesce(x, y)) %>% 
  mutate(changed = case_when(
    x == new | is.na(new) ~ "no.change",
    TRUE ~ "corrected")) %>% 
  select(new, changed) # %>% rename(x = new)

结果

# A tibble: 5 x 2
#    new changed  
#  <dbl> <chr>    
#1     1 no.change
#2     2 no.change
#3     3 no.change
#4     4 corrected
#5    NA no.change
于 2018-08-31T20:03:19.007 回答