This is a great question. I'll give the full details on recursive
predicates below, but let's first deal with result
.
Also, how does this line work exactly?:
country = getANeighbor(result)
With some of the simple examples that are given in the CodeQL handbook, they make it seem like that the 'result' keyword almost acts
like 'return' in other languages. I feel that I have a fundamental
misunderstanding of what 'result' does and how it works. I can't seem
to find a good explanation after googling.
You're right to observe that in many examples result = f()
in CodeQL
seems to act like return f()
in other languages. However, a return
statement (with its usual meaning of actually interrupting the
evaluation of a function and returning to the caller) does not make
sense in a logic language, where it's up to the CodeQL evaluator how
and in what order to evaluate the different parts of a predicate you
define. Instead, when a predicate declares a return type (string
in
your example), this implicitly declares the special variable result
in its body, whose type is the declared return type of the predicate.
You can use this variable just like any other, in particular to
constrain it as part of the predicate.
Note that this
is quite similar to how you implicitly get a special
variable called this in a member predicate (or method, in traditional
languages).
Other than its implicit declaration, there is nothing special about
the result
variable. In particular, logically you could write the
getANeighbor
predicate like this, without its use:
predicate getANeighbor(string country, string neighbor) {
country = "France" and neighbor = "Belgium"
or
country = "France" and neighbor = "Germany"
or
country = "Germany" and neighbor = "Austria"
or
country = "Germany" and neighbor = "Belgium"
or
getANeighbor(neighbor, country)
}
You do have to invoke the predicate differently if you declare it
without a result type: getANeighbor(neighbor, country)
rather than
country = getANeighbor(result)
. Other than this syntactic detail,
the two are entirely equivalent.
If you get used to the idea that result
is just an implicitly
declared variable, not some special way of signaling the returned
value of a predicate, then code like country = getANeighbor(result)
(or more complex cases, where result
is used multiple times to place
multiple conditions on it) will make more sense. Contrast it to the
result
-free version I gave above, which explicitly flips the order
of arguments in the recursive call to getANeighbor
: it's exactly the
same thing, we're invoking getANeighbor
with "flipped" arguments,
namely result
as the first parameter, and country
equated to the
"returned" value.
As a final observation, you can of course get something more similar
to the return
-like style of many examples by introducing an
intermediate variable: Instead of country = getANeighbor(result)
, we
can write exists(string neighbor | country = getANeighbor(neighbor) and result = neighbor)
, but you can see how this ends up being more
verbose.
The general intuition behind recursion in CodeQL is given here,
but at a high level you can think of each recursive call as
representing the "current" set of value tuples in the called
predicate, and all predicate definitions keep being evaluated until
there is no change. Let's see how this works in your example
predicate:
string getANeighbor(string country)
{
country = "France" and result = "Belgium"
or
country = "France" and result = "Germany"
or
country = "Germany" and result = "Austria"
or
country = "Germany" and result = "Belgium"
or
country = getANeighbor(result)
}
The first time we try to evaluate getANeighbour
, we do not know
anything about it, and so we treat a call to it as a call to the empty
set of values. This means we deduce the following values for it:
+-----------+----------+---------+
| iteration | country | result |
+-----------+----------+---------+
| 1 | France | Belgium |
| 1 | France | Germany |
| 1 | Germany | Austria |
| 1 | Germany | Belgium |
+-----------+----------+---------+
However, after deducing these values, we know that there was a
recursive call to getANeighbor
, and that the values we have for it
changed. Thus, we need to re-evaluate the definition of
getANeighbor
, now using our expanded set of values for the recursive
call. This means that we deduce the following:
+-----------+----------+---------+
| iteration | country | result |
+-----------+----------+---------+
| 2 | France | Belgium |
| 2 | France | Germany |
| 2 | Germany | Austria |
| 2 | Germany | Belgium |
| | | |
| 2 | Belgium | France |
| 2 | Germany | France |
| 2 | Austria | Germany |
| 2 | Belgium | Germany |
+-----------+----------+---------+
Note that we re-deduced the first four lines, which do not depend on a
recursive call, just like we did in iteration 1. This is fine -- we
already knew that "Belgium" = getANeighbor("France")
was true, and
thus we can simply ignore the fact that we deduced it in another way.
However, the last four lines are new; they arise from the recursive
call.
As before, we see that iteration 2 of evaluating getANeighbor
depended on a recursive predicate for which we have deduced new
values, and so we need to go around again. In the third iteration, we
deduce:
- (a) the first four lines of iteration 2 (and iteration 1), just like before, from the non-recursive parts of the predicate definition;
- (b) the last four lines of iteration 2, in the same way as for iteration 2 itself; and
- (c) all 8 lines from iteration 2, just by applying the recursive case to the results of iteration 2. However, we didn't deduce any
new values, and so we know that at this point we can stop evaluation of
getANeighbor
. We say that it has "converged" (meaning "stopped
changing"), and for any other references to it, we are going to use
the final set of values which we computed.
This is why in your query, France
is reported as a neighbor of
Germany
.
[I've omitted some details of how to make such a recursive evaluation
efficient -- in practice there are some ways to avoid re-deducing the
same values again and again on each recursive iteration --, because
they are not necessary to understand how recursion works in the first
place.]