10

我有一个更大的现有数据框。对于这个较小的示例,我想根据“第一”列将一些变量(替换状态(df1))替换为新状态(df2)。我的问题是值作为 NA 返回,因为在新数据帧(df2)中只有一些名称匹配。

现有数据框:

state = c("CA","WA","OR","AZ")
first = c("Jim","Mick","Paul","Ron")
df1 <- data.frame(first, state)

      first state
    1   Jim    CA
    2  Mick    WA
    3  Paul    OR
    4   Ron    AZ

与现有数据框匹配的新数据框

state = c("CA","WA")
newstate = c("TX", "LA")
first =c("Jim","Mick")
df2 <- data.frame(first, state, newstate)

  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA

尝试使用匹配,但在原始数据帧中未找到来自 df2 的匹配“第一个”变量的“状态”返回 NA。

df1$state <- df2$newstate[match(df1$first, df2$first)]

  first state
1   Jim    TX
2  Mick    LA
3  Paul  <NA>
4   Ron  <NA>

有没有办法忽略 nomatch 或让 nomatch 按原样返回现有变量?这将是期望结果的示例:Jim/Mick 的状态被更新,而 Paul 和 Ron 的状态没有改变。

      first state
    1   Jim    TX
    2  Mick    LA
    3  Paul    OR
    4   Ron    AZ
4

3 回答 3

9

这是你想要的吗; 顺便说一句,除非您真的想使用因子,否则请在 data.frame 调用中使用 stringsAsFactors = FALSE。注意在 match 调用中使用 nomatch = 0。

> state = c("CA","WA","OR","AZ")
> first = c("Jim","Mick","Paul","Ron")
> df1 <- data.frame(first, state, stringsAsFactors = FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE)
> df1
  first state
1   Jim    CA
2  Mick    WA
3  Paul    OR
4   Ron    AZ
> df2
  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA
> 
> # create an index for the matches
> indx <- match(df1$first, df2$first, nomatch = 0)
> df1$state[indx != 0] <- df2$newstate[indx]
> df1
  first state
1   Jim    TX
2  Mick    LA
3  Paul    OR
4   Ron    AZ
于 2014-10-04T23:08:37.303 回答
3

我认为使用字符向量会比使用因子获得更好的行为。

> df1 <- data.frame(first, state,stringsAsFactors=FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors=FALSE)
> df1[ match(df2$first, df1$first ), "state"] <- df2$newstate
> df1
  first state
1   Jim    TX
2  Mick    LA
3  Paul    OR
4   Ron    AZ
于 2014-10-04T04:14:36.750 回答
2
library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)


setkey(DT1, first, state)
setkey(DT2, first, state)

DT1[DT2]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA

注意[.data.table还有一个nomatch参数,即:

DT2[DT1, nomatch=0]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA

DT2[DT1, nomatch=NA]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA
# 3:  Paul    OR       NA
# 4:   Ron    AZ       NA

于 2014-10-04T03:23:41.737 回答