decision-tree - 启发式算法的最优决策树

Question

我有以下格式的启发式：

if a == 1 and b == 1 and c == 1:
    do something
if a == 1 and b == 1 and c == 2:
    do something
if a == 3 and b == 2 and c == 1:
    do something
if a == 2 and b == 2 and c == 1:
    do something
if a == 3 and b == 1 and c == 3:
    do something
etc.

当然，这会产生许多不必要的比较，如果语句嵌套如下，则可以避免这些比较：

if a == 1:
    if b == 1:
        if c == 1:
            do something
        if c == 2:
            do something
etc.

假设(a, b, c)我有一个案例的元组集合是有限的，并且每个元组都具有相同的被算法接收的可能性，那么我如何生成一个最优的决策树，即它进行的比较次数最少一般情况，还是在所有输入都经过它时最小化比较总数？

我想象这样的事情：

In: optimal_tree([(1, 1, 1), (1, 1, 2)])
Out: "if a == 1:
          if b == 1:
              if c == 1:
                  do something
              if c == 2:
                  do something"

In: optimal_tree([(1, 1), (2, 1), (2, 2)])
Out: "if a == 2:
          if b == 1:
              do something
          if b == 2:
              do something
      if a == 1:
          do something"

     OR

     "if b == 1:
          if a == 1:
              do something
          if a == 2:
              do something
      if b == 2:
          do something"

score 1 · Accepted Answer

Rule engines and database queries regularly deal with this problem too. You might want to look into the implementation behind those.

They have several ways to deal with this (although none is perfect):

Do the comparisons in the order defined in the Rule/Query.
Reorder the comparison based on statistical information. Databases will for example to first handle the indexed columns with the most value variance.

If you want to make your algorithm faster, you might want to look into hashing & indexing if you're not doing that already. That brings in a much bigger scale advantage.

decision-tree - 启发式算法的最优决策树

1 回答 1

Related

Reference