1

我有两张桌子,这张是旧名字

Last Name|First Name|ID
Clay      Cassius    1
Alcindor  Lou        2
Artest    Ron        3
Jordan    Michael    4
Scottie   Pippen     5
Kanter    Enes       6

新名称

Last Name|   First Name|   ID
Ali          Muhammad       1
Abdul Jabbar Kareem         2
World Peace  Metta          3
Jordan       Michael        4
Pippen       Scottie        5
Freedom      Enes Kanter    6

基本上我想加入第一个表(旧名称),如果有名称更改,它将显示新的姓氏,否则为空白

Last Name|First Name|ID|Discrepancies
Clay      Cassius    1  Ali
Alcindor  Lou        2  Abdul Jabbar
Artest    Ron        3  World Peace
Jordan    Michael    4  
Pippen   Scottie     5  
Kanter    Enes       6  Freedom

请注意,迈克尔和斯科蒂的名字没有改变,因此在差异中有一个空白。

4

3 回答 3

2

你可以使用

library(dplyr)

df1 %>% 
  left_join(df2, by = "ID", suffix = c("", ".y")) %>% 
  mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>% 
  select(-ends_with(".y"))

要得到

# A tibble: 6 x 4
  Last_Name First_Name    ID Discrepancies 
  <chr>     <chr>      <dbl> <chr>         
1 Clay      Cassius        1 "Ali"         
2 Alcindor  Lou            2 "Abdul Jabbar"
3 Artest    Ron            3 "World Peace" 
4 Jordan    Michael        4 ""            
5 Scottie   Pippen         5 "Pippen"      
6 Kanter    Enes           6 "Freedom" 

笔记:

  • 我将列命名为Last_NameFirst_Name
  • 第一个数据框包含Scottie Pippen而不是Pippen Scottie.
于 2021-12-02T22:49:51.840 回答
1

另一种可能的解决方案:

library(tidyverse)

old <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Last = c("Clay",
           "Alcindor","Artest","Jordan","Scottie","Kanter"),
  `First` = c("Cassius","Lou",
              "Ron","Michael","Pippen","Enes"),
  `ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)

new <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  `Last` = c("Ali",
             "Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
  `First` = c("Muhammad",
              "Kareem","Metta","Michael","Scottie","Enes Kanter"),
  ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)

old %>% 
  bind_rows(new) %>% 
  group_by(ID) %>% 
  summarise(
    discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_), 
    Last = first(Last), First = first(First), .groups = "drop" )

#> # A tibble: 6 × 4
#>      ID discrepancies Last     First  
#>   <int> <chr>         <chr>    <chr>  
#> 1     1 Ali           Clay     Cassius
#> 2     2 Abdul Jabbar  Alcindor Lou    
#> 3     3 World Peace   Artest   Ron    
#> 4     4 <NA>          Jordan   Michael
#> 5     5 Pippen        Scottie  Pippen 
#> 6     6 Freedom       Kanter   Enes
于 2021-12-03T01:06:00.893 回答
1

您可以简单地merge处理您的数据,然后过滤重复的事件。

dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
  ,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies")  )

dfinal$Discrepancies[ dfinal$Last.Name == dfinal$Discrepancies ] <- ""

dfinal
  Last.Name First.Name ID Discrepancies
1      Clay    Cassius  1           Ali
2  Alcindor        Lou  2  Abdul Jabbar
3    Artest        Ron  3   World Peace
4    Jordan    Michael  4
5   Scottie     Pippen  5        Pippen
6    Kanter       Enes  6       Freedom

数据

dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))

dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))
于 2021-12-03T01:18:09.757 回答