algorithm - 迭代列表列表以最大化唯一输出的最佳方式

Question

我有一个列表列表，其中内容是字符向量。例如：

yoda <- list(a=list(c("A","B","C"), c("B","C","D")), b=list(c("D","C"), c("B","C","D","E","F")))

这是一个更短的版本，我实际上正在尝试这样做，对我来说，有 11 个列表成员，每个成员都有大约 12 个子列表。对于每个列表成员，我需要选择一个子成员 liste.g. 一个“a”列表和一个“b”列表。我想找出哪个子列表组合提供了最多的唯一值，在这个简单的示例中，它将是“a”中的第一个子列表和“b”中的第二个子列表，给出最终答案：

c("A","B","C","D","E","F")

目前我刚刚得到了大量的嵌套循环，而且它似乎永远存在。这是一段糟糕的代码：

res <- list()
for (a in 1:length(extra.pats[[1]])) {
  for (b in 1:length(extra.pats[[2]])) {
    for (c in 1:length(extra.pats[[3]])) {
      for (d in 1:length(extra.pats[[4]])) {
        for (e in 1:length(extra.pats[[5]])) {
          for (f in 1:length(extra.pats[[6]])) {
            for (g in 1:length(extra.pats[[7]])) {
              for (h in 1:length(extra.pats[[8]])) {
                for (i in 1:length(extra.pats[[9]])) {
                  for (j in 1:length(extra.pats[[10]])) {
                    for (k in 1:length(extra.pats[[11]])) {
                      res[[paste(a,b,c,d,e,f,g,h,i,j,k, sep="_")]] <- unique(extra.pats[[1]][[a]], extra.pats[[2]][[b]], extra.pats[[3]][[c]], extra.pats[[4]][[d]], extra.pats[[5]][[e]], extra.pats[[6]][[f]], extra.pats[[7]][[g]], extra.pats[[8]][[h]], extra.pats[[9]][[i]], extra.pats[[10]][[j]], extra.pats[[11]][[k]])
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

如果有人有任何想法如何正确地做到这一点，那就太好了。

score 3 · Accepted Answer

这是一个建议：

# create all possible combinations
comb <- expand.grid(yoda)

# find unique values for each combination
uni <- lapply(seq(nrow(comb)), function(x) unique(unlist(comb[x, ])))

# count the unique values
len <- lapply(uni, length)

# extract longest combination  
uni[which.max(len)]

[[1]]
[1] "A" "B" "C" "D" "E" "F"

score 2 · Accepted Answer

您当前的问题维度禁止详尽搜索。这是次优算法的示例。虽然简单，但也许你会发现它给你“足够好”的结果。

算法如下：

查看您的第一个列表：选择具有最多唯一值的项目。
查看第二个列表：除了您在第 1 步中选择的项目之外，选择带来最多新唯一值的项目。
重复直到到达列表的末尾。

编码：

good.cover <- function(top.list) {
    selection <- vector("list", length(top.list))
    num.new.unique <- function(x, y) length(setdiff(y, x))
    for (i in seq_along(top.list)) {
        score <- sapply(top.list[[i]], num.new.unique, x = unlist(selection))
        selection[[i]] <- top.list[[i]][which.max(score)]
    }
    selection
}

让我们组成一些数据：

items.universe <- apply(expand.grid(list(LETTERS, 0:9)), 1, paste, collapse = "")
random.length  <- function()sample(3:6, 1)
random.sample  <- function(i)sample(items.universe, random.length())
random.list    <- function(i)lapply(letters[1:12], random.sample)
initial.list   <- lapply(1:11, random.list)

现在运行它：

system.time(final.list <- good.cover(initial.list))
#    user  system elapsed 
#   0.004   0.000   0.004

algorithm - 迭代列表列表以最大化唯一输出的最佳方式

2 回答 2

Related

Reference