0

我有3个数据框如下:

d1 <- data.frame(level1 =  c("A", "A", "B", "C", "C"), 
             level2 = c("AA", "AB", "BA", "CA", "CB"))

d2 <- data.frame(level1 =  c("A", "A", "A","B", "B", "C", "C"), 
             level3 = c("1", "2", "4", "2", "3", "1", "5"))

d3<- data.frame(level3 = c("1", "2", "3", "4", "5"), AA = c("v1", "v2", "v3", "v4", "v5"), 
            AB = c("v6", "v7", "v8", "v9", "v10"), BA = c("v11", "v12", "v13", "v14", "v15"), 
            CA = c("v16", "v17", "v18", "v19", "v20"),  CB = c("v21", "v22", "v23", "v24", "v25"))

我希望将这 3 个数据帧作为输出:

A <- data.frame(level3 = c("1", "2", "4"), AA = c("v1", "v2", "v4"), AB = c("v6", "v7", "v9"))

B <- data.frame(level3 = c( "2", "3"), BA = c("v12", "v13"))

C <- data.frame(level3 = c("1", "5"), CA = c("v16", "v20"), CB = c("v21", "25"))

从提供的 3 个数据帧(d1、d2 和 d3)中,我想为每个“Level1”(A、B、C..)获得一个单独的数据帧。

这些输出数据框应包含遵循 d1 标准的列。这些行应包含符合 d2 标准的 level3 数字。

例如,

根据 d1,AA 和 AB 与 A 匹配。因此数据框 A 应包含这两列。

根据 d2, 1,2,4 与 A 匹配,因此这些应该是数据框“A”中的行。

数据框“A”的值应基于 d3。我希望我自己解释。谢谢,

.

关于如何做到这一点的任何想法?

在我的真实示例中,Level1 和 Level2 命名法没有任何共同之处。

谢谢你的支持,

4

2 回答 2

1

使用reshape2 meltanddcastmergeandsplit

library(reshape2)
# merge three data sets together (putting d3 in long form)
full <- merge(merge(d1,d2),melt(d3, id = 1, variable.name = 'level2'))
results <- lapply(split(full, full$level1, dcast, formula =level3~level2, value.var = 'value')

# the results are in a list, we can copy to the global environment using `list2env`
# if you want (but you may wish to stay as a list
list2env(results, .GlobalEnv)
于 2013-08-28T06:12:11.413 回答
0

这有点笨拙,但我认为它可以满足您的要求:

# put d1 and d2 in a single table
dm <- merge(d1, d2)

# divide in individual dataframes based on level1 value
dspl <- split(dm, dm$level1)

# identify unique values for each level1 value
int1 <- lapply(dspl, apply, 2, unique)

# create a new dataframe:
int2 <- lapply(int1, function(x) d3[x[[3]],c("level3",x[[2]])]) 

# get the names of the level1 value to assign to objects
ndf <- names(int2)

# assign each dataframe to an object in the global environment
dmm <- lapply(ndf, function(lab) assign(lab, int2[[lab]], .GlobalEnv)) 
于 2013-08-28T05:19:43.473 回答