July 2, 2022

Okay, this post has been a long time coming. Initially I meant it to cover a generalisation of the the algorithm we developed last time. The plan was to do a hand wavy proof of correctness, and take hints from the proof process to come up with a more generic version of the algorithm.

That’s still the plan, but I wasn’t happy with the hand wavy part. So I decided to do a proper proof. But if something’s worth doing it’s definitely worth overdoing. Right? And that’s how I ended up writing a verified version of the algorithm in Dafny.

On to the proofy bit!

The algorithm works on singly linked lists implemented using C++
pointers. Working with mutable, dynamically allocated data structures
complicates the proof significantly. Since I don’t want to go down the
Separation
Logic rabbit hole we’ll make the following simplification: we treat
the linked lists as *finite sequences*.

It turns out Dafny uses a similar approach with an extra twist to ensure the lists aren’t circular.

The `Node`

class is what you’d expect–it has a value and a
next pointer–with added machinery to aid proofs and a convenience method
for constructing lists which I’ve omitted. I’ll get to them shortly.

```
class Node<T(0)>
{
var value: T;
var next: Node?<T>;
// Proof aides
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
// Pre- and postcondition
{
// Implementation
}
}
```

If you’re wondering about the `(0)`

bit after the type
parameter, that tells Dafny that whatever type we supply it must be
“default constructible” in C++ parlance.

We’ll get to the implementation of `Cons`

in a bit, but
first let’s look at how Dafny deals with mutable linked lists. Since
linked lists are a recursive data structures our algorithms will have to
use either loops or recursion. In either case Dafny will have trouble
proving that the process terminates because we might be dealing with a
circular list. The basic approach to solving this difficulty is to keep
track of what nodes go where, and prove that a node can’t appear in the
list starting at its `next`

pointer. The idiom in Dafny seems
to be to define a *predicate* called `Valid`

together with some *ghost* fields to track the proof state for
the linked list. Unlike methods and regular fields, predicates and ghost
fields are only used for verification^{1}.

```
// Proof aides
ghost var Elems: seq<T>;
ghost var Repr: set<Node<T>>;
predicate Valid()
reads this, Repr
{
this in Repr &&
|Elems| > 0 && Elems[0] == value &&
(next == null ==> |Elems| == 1) &&
(next != null ==>
next in Repr && next.Repr <= Repr && this !in next.Repr &&
next.Valid() && next.Elems == Elems[1..])
}
```

`Elems`

tracks the elements of the list. `Repr`

is more interesting, and it’s the key ingredient for allowing us to work
with linked lists. It’s a set tracking the nodes in the representation
of the list^{2}. `Valid`

uses it to check
that a node doesn’t appear in the list that starts from it, that is,
that it’s not circular. It turns out this is enough to allow Dafny to
work out termination for loops involving linked lists made of
`Node`

s.

Now we can finally look at `Cons`

which is pretty much
what you’d expect plus some proof state bookkeeping.

```
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
requires xs == null || xs.Valid()
ensures n.Valid()
ensures if xs == null then n.Elems == [x] else n.Elems == [x] + xs.Elems
{
n := new Node;
n.value, n.next := x, xs;
if xs == null
{
n.Elems := [x];
n.Repr := {n};
} else {
n.Elems := [x] + xs.Elems;
n.Repr := {n} + xs.Repr;
}
}
```

The precondition–introduced by `requires`

–asks for a valid
node, and the postconditions–introduced by `ensures`

–make
sure we leave the node in a valid state and that we’ve actually
prepended `x`

to `xs`

. The actual logic is the
first two lines. The `if`

keeps the proof state valid.

We’re now have all the ingredients to write and prove the intersection algortihm. The algorithm is similar to the one last time, but I’ve had to make a few adjustments to make Dafny happy.

Let’s start by looking at the pre- and post conditions.

```
method Intersects<T(0)>(a: Node?<T>, b: Node?<T>) returns (r: Node?<T>)
requires a == null || a.Valid()
requires b == null || b.Valid()
ensures a == null || b == null ==> r == null
ensures r == null || (r in ListRepr(a) && r in ListRepr(b))
{
// "Synchronise" lists heads
// Find the intersection point
}
```

The preconditions are fairly light, we want the lists to be valid. As mentioned above, this is key allowig Dafny to derive termination proofs for the inner loops.

By the way, the `?`

after `Node`

in the type
signatures means it’s a “nullable” type. That means we have to deal with
`null`

all over the place, but it models the original
algorithm better.

The postconditions say if either of the inputs is `null`

so will be the reult–fair enough, if any or both lists are empty they
have no intersection point–and that if the result is not
`null`

–we found an intersection point–it’s present in both
lists.

Now these may sound like reasonable postconditions, but the last one
is not as strong as it cold be. It would be better if we made sure the
intersection point is the *first* common node between the two
instead of a common node. It looks like it would take a bit more work to
convince Dafny to verify that postcondition so I decided to publish the
proof as it is. I might revisit this at some point in the future.

OK, it’s time to have a look at the first part of the algorithm where we “synchronise” the list heads so we can then walk them in lockstep.

```
// "Synchronise" lists heads
var m: nat := Length(a);
var n: nat := Length(b);
var pa, pb := a, b;
if m > n
{
while m > n
invariant pa != null ==> pa.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant m >= n
decreases ListRepr(pa)
{
pa := pa.next;
m := m - 1;
}
assert m <= n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
} else if n > m {
while n > m
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant n >= m
decreases ListRepr(pb)
{
pb := pb.next;
n := n - 1;
}
assert n <= m;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
}
```

We keep the lengths of the lists in `m`

and `n`

respectively, and use `pa`

and `pb`

to iterate
over them. I won’t show the definition of `Length`

here, but
it’s what you’d expect.

I’ve had to add an explicit `if`

around the two loops to
convince Dafny that the loop invariants hold on entry in the loops. To
be honest I think that’s reasonable, and it makes the algorithm easier
to understand at the cost of an extra check.

The loop invariants make sure `m`

and `n`

stay
in sync with the lengths of the lists as `pa`

and
`pb`

. The important point here is that call to
`Valid`

together with the
`decreases ListRepr(...)`

clause allow Dafny to prove that
the list makes progress at each step, and eventually terminates. I’m
don’t know about you but I find this pretty cool.

The assertions are not strictly necessary, but I like to have sanity checks thrown in.

On to the main loop. Before we enter the loop we have *m* ≤ *n* ∧ *n* ≤ *m*.
Therefore *m* = *n*.
They’re also both in sync with length of the lists starting in
`pa`

and `pb`

.

```
// Find the intersection point
assert m == ElemCount(pa) && n == ElemCount(pb) && m == n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
while pa != pb
invariant pa != null ==> pa.Valid()
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb) && m == n
decreases ListRepr(pa), ListRepr(pb)
{
pa := pa.next;
pb := pb.next;
m := m - 1;
n := n - 1;
}
assert pa == pb;
r := pa;
```

The loop invariants make sure that `m`

and `n`

stay in sync with the lists as well as with each other. This is why we
don’t have to explicitly check `pa`

and `pb`

for
`null`

in the loop test.

And with that we’re done. I’m still not 100% happy with the
postcondition on the algorithm. I think I would come up with a stronger
postcondition if I kept track of the pointers as sequences instead of
sets, like in the case of `Elems`

, and check that there’s no
earlier node that the sequences have in common before the result.
Anyway, this post is long enough as it is.

I’ve been meaning to play with theorem proving for a while, and this was a nice excuse to do so. It was a mostly enjoyable exercise, and Dafny seems like a nice language to start exploring the field.