0

有没有办法使大规模匹配值更具程序性?基本上我想要做的是将一堆用于值查找的列添加到数据框中,但我不想每次都编写 match[] 参数。这似乎是mapply的一个用例,但我不太清楚如何在这里使用它。有什么建议么?

这是数据:

data <- data.frame(
    region = sample(c("northeast","midwest","west"), 50, replace = T),
    climate = sample(c("dry","cold","arid"), 50, replace = T),
    industry = sample(c("tech","energy","manuf"), 50, replace = T))

以及相应的查找表:

lookups <- data.frame(
    orig_val = c("northeast","midwest","west","dry","cold","arid","tech","energy","manuf"),
    look_val = c("dir1","dir2","dir3","temp1","temp2","temp3","job1","job2","job3")
    )    

所以现在我要做的是:首先在“数据”中添加一个名为“reg_lookups”的列,它将将该区域与“查找”中的适当值相匹配。对“climate_lookups”等做同样的事情。

现在,我遇到了这个烂摊子:

data$reg_lookup <- lookups$look_val[match(data$region, lookups$orig_val)]
data$clim_lookup <- lookups$look_val[match(data$climate, lookups$orig_val)]
data$indus_lookup <- lookups$look_val[match(data$industry, lookups$orig_val)]

我尝试使用函数来执行此操作,但该函数似乎不起作用,因此将其应用于mapply是不行的(另外我对 mapply 语法如何在这里工作感到困惑):

match_fun <- function(df, newval, df_look, lookup_val, var, ref_val) {
    df$newval <- df_look$lookup_val[match(df$var, df_look$ref_val)]
    return(df)
}

data2 <- match_fun(data, reg_2, lookups, look_val, region, orig_val)
4

1 回答 1

0

我想你只是想这样做:

data <- merge(data,lookups[1:3,],by.x = "region",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[4:6,],by.x = "climate",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[7:9,],by.x = "industry",by.y = "orig_val",all.x = TRUE)

但是将查找存储在单独的数据帧中会更好。这样您就可以更轻松地控制新列的名称。它还允许您执行以下操作:

lookups1 <- split(lookups,rep(1:3,each = 3))
colnames(lookups1[[1]]) <- c('region','reg_lookup')
colnames(lookups1[[2]]) <- c('climate','clim_lookup')
colnames(lookups1[[3]]) <- c('industry','indus_lookup')

do.call(cbind,mapply(merge,
        x = list(data[,1,drop = FALSE],data[,2,drop =FALSE],data[,3,drop = FALSE]),
        y = lookups1,
        moreArgs = list(all.x = TRUE),
        SIMPLIFY = FALSE))

并且您应该能够将该do.call位包装在一个函数中。

我用来data[,1,drop = FALSE]将它们保存为一列数据框。

构造mapply调用的方式是将命名参数作为列表(x =y =部分)传递。我想确保保留来自 的所有行data,所以我通过了all.x = TRUEmoreArgs所以每次merge调用时都会通过。最后,我需要自己将它们缝合在一起,所以我关闭了SIMPLIFY.

于 2014-03-26T18:06:34.797 回答