0

数据框的子集:

             country1                 country2
                Japan                    Japan
          Netherlands                     <NA>
                 <NA>                     <NA>
               Brazil                   Brazil
   Russian Federation                     <NA>
                 <NA>                     <NA>
                 <NA> United States of America
              Germany                  Germany
              Ukraine                     <NA>
                Japan                    Japan
                 <NA>       Russian Federation
                 <NA> United States of America
               France                   France
          New Zealand              New Zealand
                Japan                     <NA>

我有两个字符向量country1country2,我想将它们合并到一个新列中。我的数据集中没有观察到不同的国家。但是,有些对具有重复的值,我只想显示一次。还有 NA 的问题,我想在合并列中省略它,其中新列中的每个值都只有国家字符串。一些观察结果在我的两列中都有 NA,我只想在新列中保留为 NA。我想知道解决这个问题的最佳方法是什么。

我在这里用一个类似的问题对投票最多的答案中的函数进行了微小的修改,将逗号的分隔变为空。

然而,这使得重复的问题没有得到解决:

             country1                 country2                                            merge
                Japan                    Japan                                       JapanJapan
          Netherlands                     <NA>                                      Netherlands
                 <NA>                     <NA>                                             <NA>
               Brazil                   Brazil                                     BrazilBrazil
   Russian Federation                     <NA>                               Russian Federation
                 <NA>                     <NA>                                             <NA>
                 <NA> United States of America                         United States of America
              Germany                  Germany                                   GermanyGermany
              Ukraine                     <NA>                                          Ukraine
                Japan                    Japan                                       JapanJapan
                 <NA>       Russian Federation                               Russian Federation
                 <NA> United States of America                         United States of America
               France                   France                                     FranceFrance
          New Zealand              New Zealand                           New ZealandNew Zealand
                Japan                     <NA>                                            Japan
4

3 回答 3

1

既然你指定dplyr了,这里有一个单行:

df <- dplyr::mutate(df, merge = dplyr::if_else(is.na(country1), country2, country1))

数据

country1 <- c("Japan", "Netherlands", NA, "Brazil", "Russian Federation", NA, NA, "Germany", "Ukraine", "Japan", NA, NA, "France", "New Zealand", "Japan")
country2 <- c("Japan", NA, NA, "Brazil", NA, NA, "United States of America", "Germany", NA, "Japan", "Russian Federation", "United States of America", "France", "New Zealand", NA)
df <- data.frame(country1, country2, stringsAsFactors = F)
于 2018-03-02T15:46:35.990 回答
1

您也可以将NA第 1 列中的值替换为第 2 列中的值:

df$country1[is.na(df$country1)] <- df$country2[is.na(df$country1)]
于 2018-03-02T16:07:08.353 回答
1

既然你说你有字符向量,那么:

library(tidyverse)
coalesce(country1,country2)
 [1] "Japan"                    "Netherlands"              NA                        
 [4] "Brazil"                   "Russian Federation"       NA                        
 [7] "United States of America" "Germany"                  "Ukraine"                 
[10] "Japan"                    "Russian Federation"       "United States of America"
[13] "France"                   "New Zealand"              "Japan"   

如果它是一个数据框。做就是了coalesce(!!!df)

于 2018-03-02T16:03:49.100 回答