algorithm - Minimal-change algorithm which maximises 'swapping'

Question

This is a question on combinatorics from a non-mathematician, so please try to bear with me!

Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation i+1 contains exactly one character that was not in generation i. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation i+1 is the same character that was swapped in in generation i. To illustrate, for n=7, k=3:

abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg

The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible.

Typical array sizes that I use are 20-30 and subset sizes around 5-8.

I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem.

Thanks

Kim Bastin

Edit 15 June 2010:

I do seem to have got into deeper water than I thought, and while I'm grateful for all answers, not all of them have been relevant. As an example of a solution which is NOT adequate, let me post my own Unicon procedure for generating k-ary subsets of a character set s in a minimal change order. Things you need to know to understand the code are: a preposed * means the size of a structure, so if s is a string, *s means the size of s (the number of characters it contains). || is a string concatenation operation. A preposed ! produces each element of a structure, e.g. each character of a string, in turn on successive passes. And the 'suspend' control structure returns a result from a procedure, but leaves the procedure 'in suspense', with all local variables in place, so that new results can be produced if the procedure is called in a loop.

procedure revdoor(s, k)  
# Produces all k-subsets of a string or character set s in a 'revolving  
# door' order. Each column except the first traverses the characters  
# available to it in alphabetical and reverse alphabetical order  
# alternately. The order of the input string is preserved. 
# If called in a loop as revdoor("abcdefg", 3), 
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,  
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,  
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg  

local i  
static Ctl  

if /Ctl then {             # this means 'if Ctl doesn't exist'
  if k = 0 then return ""  
  Ctl := list(k, 1)        # a list of k elements, each initialised to 1.
}  

if Ctl[k] = 1 then {  
  if k = 1 then suspend !s else  
    every i := 1 to *s-k+1 do {  
      suspend s[i] || revdoor(s[i+1:0], k-1)  
    }   
  } else {  
    if k = 1 then suspend !reverse(s) else  
    every i := -k to -*s by -1 do {  
      suspend s[i] || revdoor(s[i+1:0], k-1)  
    }  
  }  

  # the following line multiplies element k of Ctl by -1 if k < size of Ctl
  # (this controls the order of generation of characters), 
  # and destroys Ctl on final exit from the procedure.
  if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null   

end

Note that the output of the above procedure is not optimal in my sense. One result of my investigations so far is that the maximum 'swapping score' for a list of k-ary subsets of n elements is not less than comb(n-1, k), or in the case of n=7, k=3, the maximum score is at least comb(6, 3) = 20. I define the 'swapping score' of a list as the number of items in the list whose new element replaces an element in the previous item which was itself new. I haven't got the mathematical equipment to prove this, but it is easy to see with k=1 or k=2. For certain (n,k) a slightly higher score is possible, as in the case of n=7, k=3:

abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
beg bdg bcg
bcd bce bcf
bdf bef
def cef aef
adf acf
acd ace
ade
bde cde
cdf cdg
ceg
deg (swapping score = 21)

It may be noted that the above list is in 'strong minimal change order' (like word golf: the new character is always in the same position as the character it replaces), which may indicate the direction my own work is taking. I hope to post something more in a few days.

Kim

score 1 · Accepted Answer

这很简单。为了最大限度地替换，只需将字符视为数字并将字符串加一，直到达到上限。然后检查您是否没有在字符串中两次使用相同的字符。我认为这会起作用：

char c[] = {'a', 'b', 'c', 'd', 'e'};
const int n = 5;
const int k = 3;
char s[k];

void print()
{
    for( int i = 0; i < k; ++i )
        putchar(c[s[i]]);
    putchar('\n');
}

bool increment( int m )
{
    // reached the limit?
    if( ++s[m] == n && m == 0 )
        return false;

    next:   
    for( int i = 0; i < m; ++i )
    {
        if( s[m] == n )
        {
            // carry
            s[m] = 0;
            if( !increment( m-1 ))
                return false;
            goto next;
        }
        else if( s[i] == s[m] )
        {
            // the character is already used
            ++s[m];
            goto next;
        }   
    }
    return true;
}

int main(int, char**)
{   
    // initialise
    for( int i = 0; i < k; ++i )
        s[i] = i;

    // enumerate all combinations
    do
        print();
    while(increment(k-1));
}

score 0 · Accepted Answer

For a good answer, would computing the list of combinations all at once be acceptable, or do you need to compute them one at a time? In other words, do you need a function:

Combination nextCombo();

or would

vector<Combination> allCombinations();

be acceptable?

If computing the combinations in batch is permissible, it is possible that an iterative-deepening A* search (or just an A* search but I suspect it'd run out of memory) would work. With an admissible heuristic, A* is guaranteed to give the optimum. I'm short of time, so I decided to post this partial answer and edit the post if I get time to write code.

A* is a graph search algorithm. In this case, the nodes are lists of combinations used so far (no duplicates allowed in the list). My plan was to use a bit-string representation for the nodes. n=30 would fit into a 32 bit integer. We can arbitrarily permute any solution so that the first combination begins with 0's and ends in 1's, i.e. 000...1111. A node with a shorter list is connected to a longer one if the two lists are the same up until the last element and the last element differs only in having one 0'bit flipped to a 1 and one 1 bit flipped to a 0. The distance between the two is 0 if there was a "swap" and 1 if there was a swap.

A second representation for each combination is a sorted list of the bits that are turned on. One possible admissible heuristic for this graph is to use this index list. Each time (in the list of combinations) an index is used at a particular position in the index list, mark it off. The number of positions with un-used indices - the current last changed index is (I believe) the minimal number of swaps that need to happen.

To illustrate this heuristic, consider the sequence abc abd abe* abf* abg afg from above. (the letters would be numbers in my treatment, but that is a minor difference). This sequence (which would be one node in the search-graph) would have the following places marked:

 1   2   3
*a      
 b  *b   
 c   c  *c
 d   d  *d
 e   e  *e
    *f  *f
        *g

Thus the heuristic would predict that there is at least one swap required (since there are no unmarked elements in position 3 and the current position is 2).

If I get the time, I'll edit this to post code and performance of the algorithm.

Re: the NP completeness result (in a comment to Zac Thompson's answer). The graph on which we are searching for a minimal cost Hamiltonian path has a very special structure. For example, the normally NP-complete Hamiltonian Path problem can be solved in O(n) time with the "enumerate all combinations" algorithm - with n being the number of nodes in the graph. This structure makes it possible that, though on a general graph, vertex cover is hard, on your graph it may be polynomial (even linear or quadratic). Of course, since the graph has a lot of nodes for e.g. n=30, k=8 you may still have a lot of computation ahead of you.

score 0 · Accepted Answer

我没有从算法开始，而是尝试想一种方法来找到最大“交换分数”的形式，以便您知道要拍摄什么。通常可以从这样的证明中产生产生所需结构的算法。

自从大学以来已经有很长时间了，但我试图想出一个组合模型来帮助解决这个问题，但运气不佳。

我首先将组合集想象为图中的顶点，其边对应于组合的“邻接”（只有一个元素差异）。所以：

“n 选择 k” 个顶点
每个顶点的度数为 k(nk)
边数 = "n 选择 k" * k(nk) / 2 = "n 选择 2" * "n-2 选择 k-1"

这些图有很多对称性。对于任何给定的 {n,k}，该图与 {n,nk} 的图相同。如果 k=1 或 k=n-1 它是 n 个顶点上的完整图（每个组合与所有其他组合仅相差一个字符）。不过，我看不到一个明显的算法。

编辑：我的下一个想法是用稍微不同的解释来构思图表。您可以将每个 {n,k} 组合视为有 k 个 1 的 n 位序列。1 的位置对应于组合中存在的 n 个字符中的哪一个。所以对于 n=7 k=3，abc 是 1110000，adg 是 1001001，efg 是 0000111。这样，您还可以想象位于 n 维超立方体角上的点。因此，对于给定的子序列，如果边缘是共面的，则它们与您的“最小交换”标准相匹配：我认为它们是通过超立方体的“切割平面”。

您正在通过此组合图寻找一条符合您的特殊标准的哈密顿路径。

考虑问题的另一种方法是最小化序列中更改组合中的哪个字符的次数。

score 0 · Accepted Answer

Kim，您的问题描述听起来非常像（家庭作业）尝试描述枚举一组 n 元素的所有 k 组合的最简单解决方案 - 而不会太容易地给出实际解决方案。无论如何，请参阅下面的我的镜头。我使用了 Java，但重要的部分与 C 没有什么不同。

public class Homework
{
  /**
   * Prints all k-combinations of a set of n elements. Answer to this 
   * question: http://stackoverflow.com/questions/2698551
   */
  public static void main(String[] args)
  {
    Combinations combinations = new Combinations(7, 3);
    System.out.printf(
        "Printing all %d %d-combinations of a set with %d elements:\n", 
        combinations.size(), combinations.k, combinations.n);
    for (int[] c : combinations)
      System.out.println(Arrays.toString(c));
  }

  /**
   * Provides an iterator for all k-combinations of a set of n elements. 
   */
  static class Combinations implements Iterable<int[]>  
  {
    public final int n, k;

    public Combinations(int n, int k)
    {
      if (n < 1 || n < k)
        throw new IllegalArgumentException();
      this.n = n;
      this.k = k;
    }

    @Override
    public Iterator<int[]> iterator()
    {
      return new Iterator<int[]>()
      {
        private int[] c;

        @Override
        public void remove() { throw new UnsupportedOperationException(); }

        @Override
        public int[] next()
        {
          if (c == null)
          {
            c = new int[k];
            for (int i = 0; i < k; i++)
              c[i] = i;
          }
          else
          {
            int i = c.length - 1;
            while (i >= 0 && c[i] == n - k + i)
              i--;

            if (i < 0)
              throw new NoSuchElementException();

            c[i]++;
            for (int j = i + 1; j < c.length; j++)
              c[j] = c[i] + j - i;
          }
          return c.clone(); // remove defensive copy if performance is more important
        }

        @Override
        public boolean hasNext() { return c == null || c[0] < n - k; }
      };
    }

    /**
     * Returns number of combinations: n! / (k! * (n - k)!).
     */
    public BigInteger size()
    {
      BigInteger s = BigInteger.valueOf(n);
      for (int i = n - 1; i > n - k; i--)
        s = s.multiply(BigInteger.valueOf(i));
      for (int i = k; i > 1; i--)
        s = s.divide(BigInteger.valueOf(i));
      return s;
    }
  }
}

这是您的示例的输出：

Printing all 35 3-combinations of a set with 7 elements:
[0, 1, 2] [0, 1, 3] [0, 1, 4] [0, 1, 5] [0, 1, 6] [0, 2, 3] [0, 2, 4] [0, 2, 5] [0, 2, 6] [0, 3, 4] 
[0, 3, 5] [0, 3, 6] [0, 4, 5] [0, 4, 6] [0, 5, 6] [1, 2, 3] [1, 2, 4] [1, 2, 5] [1, 2, 6] [1, 3, 4] 
[1, 3, 5] [1, 3, 6] [1, 4, 5] [1, 4, 6] [1, 5, 6] [2, 3, 4] [2, 3, 5] [2, 3, 6] [2, 4, 5] [2, 4, 6] 
[2, 5, 6] [3, 4, 5] [3, 4, 6] [3, 5, 6] [4, 5, 6]

score 0 · Accepted Answer

我在 2010 年解决了这个问题，但当时没有找到解决方案。几天前，我再次查看了我当时的一些笔记，并怀疑我已经非常接近解决方案了。几分钟后，我拿到了钥匙。

概括地说：要求是字符串 s 的 k 子集的严格最小更改排序，以使 LIFO（后进先出）最大化。我在之前的帖子中将此称为最大化“交换”。

在关键要求之后，我将算法称为 maxlifo（最大化 LIFO）。它有两个参数，一个字符串 s，它不能包含重复的字符，以及一个不大于 s 大小的正整数 k。该算法是递归的，即maxlifo(s, k) 使用maxlifo(s, k-1) 的输出直到k=1。输出作为列表返回。

下面我给出一个非正式的解释，用例子，使用字符串“abcdefg”和k的各种值。接下来是一个作为 Unicon 过程的实现示例。（我不会流利使用任何更常用的语言。）

k=1 的情况很简单——它按从第一个到最后一个的顺序返回 s 的元素。

对于 k>1，将以下规则应用于 maxlifo(s, k-1) 的输出：

(1) 对于 maxlifo(s, k-1) 输出的每个元素，依次列出从该元素构建的 k 子集，其中每个缺失字符 s。子集中字符的顺序与 s 中的一样。

(2) 从第二行开始，用空白的“占位符”替换前面一行中出现的所有子集。s 的每个 k 子集现在只出现一次。

(3) 在每个非空白行中，用首字母 ! 每个子集，以便在下一行的相同位置有一个占位符。这个标记的意思是“第一”。在每个非空白行中，恰好一个子集将被如此标记。

(4) 删除所有完全空白的行（仅包含占位符）。

(5) 在除最后一行之外的每一行中，用最后一行标记！与下一行中标记为“第一”的子集相对应的位置的子集。这个标记的意思是“最后一个”。

现在我们可以列出子集的最终 maxlifo 排序。从上到下对每一行进行排序，并将其元素按该顺序添加到输出列表中。

(6) 自上而下的每一行：

(6.1) 删除所有空白占位符。

(6.2) 将标记为“first”（初始！）的子集添加到输出列表中，并将其从行中删除。

(6.3) 如果行中仍有剩余子集，则最左边或最右边的子集将被标记为“最后一个”（最终！）。如果最右边的子集标记为“最后一个”，则将子集按从左到右的顺序添加到输出列表中，否则按照从右到左的顺序添加。

(7) 处理完所有行后，返回输出列表。

使用 maxlifo("abcdefg", 2) 的示例：

Col1 包含 maxlifo("abcdefg", 1) 的输出。Col2 的行包含由 s 的剩余字符形成的团：

Col1    Col2
----    ----------------------------
a       ab   ac   ad   ae   af   ag
b       ab   bc   bd   be   bf   bg
c       ac   bc   cd   ce   cf   cg
d       ad   bd   cd   de   df   dg
e       ae   be   ce   de   ef   eg
f       af   bf   cf   df   ef   fg
g       ag   bg   cg   dg   eg   fg

清除出现在前面行中的子集：

a       ab   ac   ad   ae   af   ag
b            bc   bd   be   bf   bg
c                 cd   ce   cf   cg
d                      de   df   dg
e                           ef   eg
f                                fg
g

标记每行中的“第一个”子集（下面有空格的那个）：

a      !ab   ac   ad   ae   af   ag
b           !bc   bd   be   bf   bg
c                !cd   ce   cf   cg
d                     !de   df   dg
e                          !ef   eg
f                               !fg
g

删除所有完全空白的行（在这种情况下只有一个）：

a      !ab   ac   ad   ae   af   ag
b           !bc   bd   be   bf   bg
c                !cd   ce   cf   cg
d                     !de   df   dg
e                          !ef   eg
f                               !fg

标记每行中的“最后一个”子集（下面有“第一个”子集的那个）。

a      !ab   ac!  ad   ae   af   ag
b           !bc   bd!  be   bf   bg
c                !cd   ce!  cf   cg
d                     !de   df!  dg
e                          !ef   eg!
f                               !fg

按上述顺序输出每一行：'first'、unmarked、'last'：

                                               Ordered rows:
a      !ab   ac!  ad   ae   af   ag            ab ag af ae ad ac
b           !bc   bd!  be   bf   bg            bc bg bf be bd
c                !cd   ce!  cf   cg            cd cg cf ce
d                     !de   df!  dg            de dg df
e                          !ef   eg!           ef eg
f                               !fg            fg

输出： [ab af ae ad ac bc bg bf be bd cd cg cf ce df dg de ef eg fg]

3 <= k <= 6 的示例给出的细节较少。步骤 4 中删除的空白行保留在原处。

maxlifo（“abcdefg”，3）：

                                                       Ordered rows:
ab     !abc   abd   abe   abf   abg!                   abc abd abe abf abg
ag            acg   adg   aeg! !afg                    afg acg adg aeg
af            acf   adf! !aef                          aef acf adf
ae            ace! !ade                                ade ace
ad           !acd!                                     acd
ac                     
bc           !bcd   bce   bcf   bcg!                   bcd bce bcf bcg
bg                  bdg   beg! !bfg                    bfg bdg beg
bf                  bdf! !bef                          bef bdf
be                 !bde!                               bde 
bd                                  
cd                 !cde   cdf   cdg!                   cde cdf cdg
cg                        ceg! !cfg                    cfg ceg
cf                       !cef!                         cef
ce                             
de                       !def   deg!                   def deg
dg                             !dfg!                   dfg
df                             
ef                             !efg                    efg
eg                       
fg

输出： [abc abd abe abf abg afg acg adg aeg aef acf adf ade ace acd
bcd bce bcf bcg bfg bdg beg bef bdf bde cde cdf cdg cfg ceg cef def deg dfg efg]

maxlifo("abcdefg", 4):

                                            Ordered rows:
abc         !abcd  abce!  abcf   abcg       abcd abcg abcf abce
abd               !abde   abdf!  abdg       abde abdg abdf
abe                      !abef   abeg!      abef abeg
abf                             !abfg!      abfg
abg                                   
afg                acfg!  adfg  !aefg       aefg adfg acfg
acg               !acdg   aceg!             acdg aceg
adg                      !adeg!             adeg
aeg                                  
aef                acef! !adef              adef acef
acf               !acdf!                    acdf
adf                                  
ade               !acde!                    acde
ace                                   
acd                                   
bcd               !bcde   bcdf!  bcdg       bcde bcdg bcdf
bce                      !bcef   bceg!      bcef bceg
bcf                             !bcfg!      bcfg
bcg                                  
bfg                       bdfg! !befg       befg bdfg
bdg                      !bdeg!             bdeg
beg                                  
bef                      !bdef!             bdef
bdf                                  
bde                                  
cde                      !cdef   cdeg!      cdef cdeg
cdf                             !cdfg!      cdfg
cdg                                   
cfg                             !cefg!      cefg
ceg                                  
cef                                  
def                             !defg       defg
deg                          
dfg                          
efg

输出： [abcd abcg abcf abce abde abdg abdf abef abeg abfg aefg adfg acfg acdg aceg adeg adef acef acdf acde bcde bcdg bcdf bcef bceg bcfg befg bdfg bdeg bdef cdef cdeg cdfg cefg defg]

maxlifo（“abcdefg”，5）：

                                            Ordered rows:
abcd       !abcde   abcdf   abcdg!          abcde abcdf abcdg 
abcg                abceg! !abcfg           abcfg abceg 
abcf               !abcef!                  abcef 
abce                                
abde               !abdef   abdeg!          abdef abdeg 
abdg                       !abdfg!          abdfg 
abdf                         
abef                       !abefg!          abefg 
abeg                         
abfg                         
aefg                acefg! !adefg           adefg acefg 
adfg               !acdfg!                  acdfg 
acfg                         
acdg               !acdeg!                  acdeg 
aceg                         
adeg                         
adef               !acdef!                  acdef 
acef                         
acdf                         
acde                         
bcde               !bcdef   bcdeg!          bcdef bcdeg 
bcdg                       !bcdfg!          bcdfg 
bcdf                         
bcef                       !bcefg!          bcefg 
bceg                         
bcfg                         
befg                       !bdefg!          bdefg 
bdfg                         
bdeg                         
bdef                         
cdef                       !cdefg           cdefg 
cdeg                         
cdfg                         
cefg                         
defg

输出：[abcde abcdf abcdg abcfg abceg abcef abdef abdeg abdfg abefg adefg acefg acdfg acdeg acdef bcdef bcdeg bcdfg bcefg bdefg cdefg]

maxlifo（“abcdefg”，6）：

                                            Ordered rows:
abcde               !abcdef   abcdeg!       abcdef abcdeg 
abcdf                        !abcdfg!       abcdfg 
abcdg                               
abcfg                        !abcefg!       abcefg 
abceg                              
abcef                               
abdef                        !abdefg!       abdefg 
abdeg                               
abdfg                               
abefg                               
adefg                               
acefg                        !acdefg!       acdefg 
acdfg                               
acdeg                               
acdef                               
bcdef                        !bcdefg        bcdefg 
bcdeg                               
bcdfg                               
bcefg                               
bdefg                               
cdefg

输出：[abcdef abcdeg abcdfg abcefg abdefg acdefg bcdefg]

统一实施：

procedure maxlifo(s:string, k:integer)
# A solution to my combinatorics problem from 2010.
# Return a list of the k subsets of the characters of a string s
# in a minimal change order such that last-in first-out is maximised.
# String s must not contain duplicate characters and in the present 
# implementation must not contain "!", which is used as a marker.

  local ch, cand, Hit, inps, i, j, K, L, Outp, R, S

  # Errors
  if *cset(s) ~= *s then 
    stop("Duplicate characters in set in maxlifo(", s, ", ", k, ")")
  if find("!", s) then 
    stop("Illegal character in set in maxlifo(", s, ", ", k, ")")
  if k > *s then 
    stop("Subset size larger than set size in maxlifo(", s, ", ", k, ")")

  # Special cases
  if k = 0 then return []
  if k = *s then return [s]

  Outp := []
  if k = 1 then {
    every put(Outp, !s)
    return Outp
  }

  # Default case
  S := set()
  K := []

  # Build cliques from output of maxlifo(s, k-1) with the remaining 
  # characters in s, substituting empty strings as placeholders for 
  # subsets already listed.
  every inps := !maxlifo(s, k-1) do { 
    R := []
    every ch := !s do
      if not find(ch, inps) then {
        cand := reorder(inps ++ ch, s)
        if member(S, cand) then cand := "" else insert(S, cand)
        put(R, cand)
      }
    put(K, R)
  }

  # Mark ‘first’ subset in each row with initial "!"
  every i := 1 to *K - 1 do {
    every j := 1 to *K[i] do
      if K[i, j] ~== "" & K[i+1, j] == "" then {
        K[i, j] := "!" || K[i, j]
        break
      }
  }

  # Remove rows containing only placeholders
  every i := *K to 1 by -1 do {
    every if !K[i] ~== "" then break next
    delete(K, i)  
  }

  # Mark ‘last’ subset in each row with final "!"
  every i := 1 to *K - 1 do 
    every j := 1 to *K[i] do 
      if K[i+1, j][1] == "!" then {
        K[i, j] ||:= "!"
        break
      }

  # Build output list
  every R := !K do {

    # Delete placeholders from row (no longer needed and in the way)
    every j := *R to 1 by -1 do if R[j] == "" then delete(R, j)

    # Handle ‘first’ subset and remove from row
    # N.B. ‘First’ subset will be leftmost or rightmost in row
    if R[1][1] == "!" then 
      put(Outp, trim(get(R), '!', 0)) 
      else put(Outp, trim(pull(R), '!', 0))   

    # Handle any remaining subsets, ‘last’ subset last, stripping '!' markers
    # N.B. ‘Last’ subset will be leftmost or rightmost in row after removal
    # of ‘first’ subset.
    if R[-1][-1] == "!" then while put(Outp, trim(get(R), '!', 0)) else
      while put(Outp, trim(pull(R), '!', 0))
  }

  return Outp

end


procedure reorder(cs:cset, s:string)
# Reorder cset cs according to string s

  local r
  # If no s, return set in alphabetical order
  if /s then return string(cs)

  r := ""
  s ? while tab(upto(cs)) do r ||:= move(1)
  return r

end

algorithm - Minimal-change algorithm which maximises 'swapping'

5 回答 5

Related

Reference