1

The problem is the following:

I developed an expression evaluation engine that provides a XPath-like language to the user so he can build the expressions. These expressions are then parsed and stored as an expression tree. There are many kinds of expressions, including logic (and/or/not), relational (=, !=, >, <, >=, <=), arithmetic (+, -, *, /) and if/then/else expressions.

Besides these operations, the expression can have constants (numbers, strings, dates, etc) and also access to external information by using a syntax similar to XPath to navigate in a tree of Java objects.

Given the above, we can build expressions like:

/some/value and /some/other/value

/some/value or /some/other/value

if (<evaluate some expression>) then 
    <evaluate some other expression> 
else 
    <do something else>

Since the then-part and the else-part of the if-then-else expressions are expressions themselves, and everything is considered to be an expression, then anything can appear there, including other if-then-else's, allowing the user to build large decision trees by nesting if-then-else's.

As these expressions are built manually and prone to human error, I decided to build an automatic learning process capable of optimizing these expression trees based on the analysis of common external data. For example: in the first expression above (/some/value and /some/other/value), if the result of /some/other/value is false most of the times, we can rearrange the tree so this branch will be the left branch to take advantage of short-circuit evaluation (the right side of the AND is not evaluated since the left side already determined the result).

Another possible optimization is to rearrange nested if-then-else expressions (decision trees) so the most frequent path taken, based on the most common external data used, will be executed sooner in the future, avoiding unnecessary evaluation of some branches most of the times.

Do you have any ideas on what would be the best or recommended approach/algorithm to use to perform this automatic refactoring of these expression trees?

4

1 回答 1

1

I think what you are describing is compiler optimizations which is a huge subject with everything from

  • inline expansion
  • deadcode elimination
  • constant propagation
  • loop transformation

Basically you have a lot of rewrite rules that are guaranteed to preserve the functionality of the code/xpath.

In the question on rearranging of the nested if-else I don't think you need to resort to machine-learning. One (I think optimal) approach would be to use Huffman coding of your links huffman_coding Take each path as a letter and we then encode them with Huffman coding and get a so called Huffman tree. This tree will have the least evaluations running on a (large enough) sample with the same distribution you made the Huffman tree from.

If you have restrictions on ``evaluate some expression''-expresssion or that they have different computational cost etc. You probably need another approach.

And remember, as always when it comes to optimization you should be careful and only do things that really matter.

于 2013-01-06T11:12:47.563 回答