如果其他电子邮件的名称重复,我正在尝试检查电子邮件数据中的重复项。它可以工作,但如果有相同名称的重复项,那么它应该将所有名称反映为重复项。
因此,例如,如果 abcd@ddd.com 有多个条目,例如 abcd@ccc.com 或 abcd@fff.com,则所有三个条目都应反映为重复。
此外,如果 abby.del@ddd.com 有多个条目,例如 abby-del@ccc.com 或 abby_del@fff.com,则所有三个条目都应反映为重复。
df <- data.frame(EMP.ID = c(88111,"BBB4477","BBB4058","BBB5832","BBB0338","BBB1814","BBB6543",875430,875970,"BBB0243","BBB1943","BBB9344","BBB9701","BBB1814","BBB8648","BBB4373","BBB7270","BBB6165","BBB7460","BBB7528","BBB6092"),
name = c("link adam","dy tt","link adam","gbesada","dojeda","slew lang","?alpucheta","r zona","jachaval","allo nyyn","mbautis","grand fring","jali","kintom dang","namoti","shan mig","NA","NA","NA","NA",NA),
email = c("link.adam@gmail.com","dy.tt@abcd.com","link_adam@gmail.com","gbesada@abcd.com","dojeda@abcd.com","?slew.lang@abcd.com","dy-tt@abcd.com","?rzona@abcd.com","jachaval@abcd.com","allo@abcd.com","mbautis@abcd.com","grand.fring@abcd.com","jali@abcd.com","kintom.dang@abcd.com","namoti@abcd.com","shan.mig@abcd.com","mbautis@XYZ.com","?slew.lang@abcd.com",NA,"NA",NA))
separator= " "
valuesToIgnore <- c(NA, NA)
df <- df %>%
mutate(across(c(name,email), tolower)) %>%
mutate(email_name1 = str_extract(email, "([a-z.]+)(?=@.+)")) %>%
mutate(email_name1 = str_replace_all(email_name1, "\\.", separator)) %>%
mutate(`13. duplicate name with mailid` = ifelse(duplicated(email_name1, incomparables=valuesToIgnore),"Duplicate email username exists",NA))
我尝试了很多解决方案,是否有任何永久的解决方案来处理电子邮件数据......???