2

I have expressions like the following:

{1000} AND ({1001} OR {1002} OR {1003})

Allowed operators are OR and AND, expressions can be nested using parenthesis. I already managed to tokenize this string and to convert it to an abstract syntax tree (AST) using the Shunting Yard algorithm, implemented in PHP 5.3. The above expression results in the following:

1000 1001 1002 | 1003 | &


    &
  /   \
1000   |
      / \
     |   1003
    / \
1001  1002

When traversing this tree I want to output the final combinations of numbers a user can choose from. In the given representation this is not possible. What I need is actually the form, after the distributive law was applied:

(1000 & 1001) | (1000 & 1002) | (1000 & 1003)

1000 1001 & 1000 1002 & | 1000 1003 & |

               _______________|_____________
              /                             \
      _______|____                           &
     /            \                         / \
    &              &                    1000   1003
  /   \           / \
1000   1001    1000  1002

I concluded, that the only nodes that are allowed to be &-operator nodes, are the last ones that carry the leafs. All others have to be |-operator nodes.

How to convert an arbitrary AST with the grammar explained above to one that represents all final permutations? Is it better to apply the distributive law on the tokens of the infix representation? Is it easier to work with the RPN representation instead of the tree?

Please also note, that there are more difficult examples possible like:

(1000 & 1008) & (1001 | 1002 | 1003)
1000 1008 & 1001 1002 | 1003 | &
       ______ & ___
      /            \
     &             |
    / \           / \
1000   1008      |   1003
                / \
            1001  1002

Which I'd like to result in:

(1000 & 1008 & 1001) | (1000 & 1008 & 1002) | (1000 & 1008 & 1003)
1000 1008 & 1001 & 1000 1008 & 1002 & | 1000 1008 & 1003 & |

                        __________________|_________
                       /                            \
         _____________|_________                     &
        /                       \                   / \
       &                        &                  &   1003
      /  \                     / \                / \
     &    1001                &   1002        1000   1008
    / \                      / \
1000   1008              1000   1008

For another (more complicated) example just switch left sub tree and right sub tree or add another &-node in place of 1003 => 1003 1009 &

What I already tried: Googling a lot, traversing the tree pre and post order, trying to find an algorithm with no success.

I am grateful for any hints and pointers into the right direction.

4

2 回答 2

1

你似乎想做的是产生析取范式。这比看起来更难做,因为有很多有趣的案例需要处理。

你想要做的是在你的树的任何地方详尽地实现以下重写规则(实际上,叶子向上可能已经足够好了):

 rule distribute_and_over_or(a: term, b: term, c: term): term->term
    "  \a and (\b or \c) " ->  " \a and \b or \a and \c ";

用复杂的术语来说,你会得到多余的子项,所以你可能需要这些规则:

 rule subsumption_identical_or_terms:(a: term): term->term
    "  \a or \a " ->  \a";

 rule subsumption_identical_and_terms:(a: term): term->term
    "  \a and \a " ->  \a";

您表达问题的方式,您没有使用“不”,但它可能会出现,因此您需要以下附加规则:

 rule cancel_nots:(term: x): term -> term
    " not (not \x)) " -->  "\x";

rule distribute_not_over_or(a: term, b: term): term->term
    " not( \a or \b ) " ->  " not \a  and not \b ";

 rule distribute_not_over_and(a: term, b: term): term->term
    " not( \a and \b ) " ->  " not \a  or not \b ";

您可能还会遇到自我取消条款,因此您需要处理这些:

 rule self_cancel_and(a: term): term->term
     "  \a and not \a " -> "false";

 rule self_cancel_or(a: term): term->term
     "  \a or not \a " -> "true";

以及摆脱真假的方法:

 rule and_true(a: term): term->term
     " \a and true " -> " \a ";

 rule and_false(a: term): term->term
     " \a and false " -> " false ";

 rule or_true(a: term): term->term
     " \a or true " -> " true ";

 rule and_false(a: term): term->term
     " \a or false " -> " \a ";

 rule not_false(a: term): term->term
     " not false " -> " true ";

 rule not_true(a: term): term->term
     " not true " -> " false ";

(我假设“not”绑定比“and”绑定比“or”更紧密的表达式优先级)。

显示的规则假设各种子树充其量是“二进制”,但它们可能有许多实际的子树,如您在示例中所示。实际上,您也必须担心结合律。如果您希望包含和取消法真正起作用,您还必须考虑交换法。

如果您的子表达式包含关系运算符,您可能会发现一些隐含的“非”传播,例如,

    " not ( x > y ) " -->  " x <= y "

您可能还希望标准化您的关系比较:

    "  x < y " -->  " not (x >= y )"

由于您已经在 PHP 中实现了树,因此您必须手动编写与这些等效的代码,方法是按程序在树上爬上爬下。这是可能的,但非常不方便。(您可以在令牌作为 RPN 和 AST 上执行此操作,但我认为您会发现在 AST 上更容易,因为您不必对令牌字符串进行洗牌)。

在处理符号公式时,更容易应用一个引擎,通常是一个程序转换系统,它将直接接受重写并为您应用它们。我在这里使用的符号取自我们的 DMS Software Reengineering Toolkit,它直接采用这些规则并自动处理关联性和交换性。这在 PHP 中可能不是一个可行的选择。

最后一个问题:如果您的术语有任何复杂性,最终的析取范式可能会变得非常大,非常快。我们有一个客户正是想要这个,直到我们在一个很大的开始时把它给了他,这恰好产生了数百个叶子连词。(到目前为止,我们还没有找到一种呈现任意布尔项的好方法。)

于 2014-07-15T03:01:39.500 回答
0

感谢您提到对我帮助最大的关键字:析取范式。我不知道实际上在寻找这种转变。

我在互联网上找不到详细的算法描述,所以我尝试自己做。这就是我在伪代码中所做的。请告诉我,如果它无法理解。

- Traverse the AST recursively post order wise
- If an &-node is found, check if one of the children nodes is a |-node
- Set orChild and andChild accordingly
- Traverse the orChild-tree iterative pre order wise and for each OR-leaf push a new &-node with andChild and the OR-leaf value to the stack
- If you meet another &-node push a new &-node with andChild and the whole &-node you found to the stack
- After traversing is done, combine the nodes on the stack using an |-node
- The new sub tree, which has an |-node as root, replaces the &-node you started to traverse from
- As the outer traversal is post order, the newly created nodes are not traversed and have no effect on further changes
- Repeat the whole process until the resulting tree does not change anymore
于 2014-07-15T14:03:50.377 回答