# A verified algorithm for determining the intersection point of two lists in O(1) space

July 2, 2022

Okay, this post has been a long time coming. Initially I meant it to cover a generalisation of the the algorithm we developed last time. The plan was to do a hand wavy proof of correctness, and take hints from the proof process to come up with a more generic version of the algorithm.

That’s still the plan, but I wasn’t happy with the hand wavy part. So I decided to do a proper proof. But if something’s worth doing it’s definitely worth overdoing. Right? And that’s how I ended up writing a verified version of the algorithm in Dafny.

On to the proofy bit!

# A short aside on mutability

The algorithm works on singly linked lists implemented using C++ pointers. Working with mutable, dynamically allocated data structures complicates the proof significantly. Since I don’t want to go down the Separation Logic rabbit hole we’ll make the following simplification: we treat the linked lists as finite sequences.

It turns out Dafny uses a similar approach with an extra twist to ensure the lists aren’t circular.

# The linked list data structure

The `Node` class is what you’d expect–it has a value and a next pointer–with added machinery to aid proofs and a convenience method for constructing lists which I’ve omitted. I’ll get to them shortly.

``````class Node<T(0)>
{
var value: T;
var next: Node?<T>;

// Proof aides

static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
// Pre- and postcondition
{
// Implementation
}
}``````

If you’re wondering about the `(0)` bit after the type parameter, that tells Dafny that whatever type we supply it must be “default constructible” in C++ parlance.

We’ll get to the implementation of `Cons` in a bit, but first let’s look at how Dafny deals with mutable linked lists. Since linked lists are a recursive data structures our algorithms will have to use either loops or recursion. In either case Dafny will have trouble proving that the process terminates because we might be dealing with a circular list. The basic approach to solving this difficulty is to keep track of what nodes go where, and prove that a node can’t appear in the list starting at its `next` pointer. The idiom in Dafny seems to be to define a predicate called `Valid` together with some ghost fields to track the proof state for the linked list. Unlike methods and regular fields, predicates and ghost fields are only used for verification1.

``````  // Proof aides

ghost var Elems: seq<T>;
ghost var Repr: set<Node<T>>;

predicate Valid()
{
this in Repr &&
|Elems| > 0 && Elems == value &&
(next == null ==> |Elems| == 1) &&
(next != null ==>
next in Repr && next.Repr <= Repr && this !in next.Repr &&
next.Valid() && next.Elems == Elems[1..])
}``````

`Elems` tracks the elements of the list. `Repr` is more interesting, and it’s the key ingredient for allowing us to work with linked lists. It’s a set tracking the nodes in the representation of the list2. `Valid` uses it to check that a node doesn’t appear in the list that starts from it, that is, that it’s not circular. It turns out this is enough to allow Dafny to work out termination for loops involving linked lists made of `Node`s.

Now we can finally look at `Cons` which is pretty much what you’d expect plus some proof state bookkeeping.

``````  static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
requires xs == null || xs.Valid()
ensures n.Valid()
ensures if xs == null then n.Elems == [x] else n.Elems == [x] + xs.Elems
{
n := new Node;
n.value, n.next := x, xs;
if xs == null
{
n.Elems := [x];
n.Repr := {n};
} else {
n.Elems := [x] + xs.Elems;
n.Repr := {n} + xs.Repr;
}
}``````

The precondition–introduced by `requires`–asks for a valid node, and the postconditions–introduced by `ensures`–make sure we leave the node in a valid state and that we’ve actually prepended `x` to `xs`. The actual logic is the first two lines. The `if` keeps the proof state valid.

# The intersection algorithm

We’re now have all the ingredients to write and prove the intersection algortihm. The algorithm is similar to the one last time, but I’ve had to make a few adjustments to make Dafny happy.

Let’s start by looking at the pre- and post conditions.

``````method Intersects<T(0)>(a: Node?<T>, b: Node?<T>) returns (r: Node?<T>)
requires a == null || a.Valid()
requires b == null || b.Valid()
ensures a == null || b == null ==> r == null
ensures r == null || (r in ListRepr(a) && r in ListRepr(b))
{
// "Synchronise" lists heads

// Find the intersection point
}``````

The preconditions are fairly light, we want the lists to be valid. As mentioned above, this is key allowig Dafny to derive termination proofs for the inner loops.

By the way, the `?` after `Node` in the type signatures means it’s a “nullable” type. That means we have to deal with `null` all over the place, but it models the original algorithm better.

The postconditions say if either of the inputs is `null` so will be the reult–fair enough, if any or both lists are empty they have no intersection point–and that if the result is not `null`–we found an intersection point–it’s present in both lists.

Now these may sound like reasonable postconditions, but the last one is not as strong as it cold be. It would be better if we made sure the intersection point is the first common node between the two instead of a common node. It looks like it would take a bit more work to convince Dafny to verify that postcondition so I decided to publish the proof as it is. I might revisit this at some point in the future.

OK, it’s time to have a look at the first part of the algorithm where we “synchronise” the list heads so we can then walk them in lockstep.

``````  // "Synchronise" lists heads

var m: nat := Length(a);
var n: nat := Length(b);
var pa, pb := a, b;

if m > n
{
while m > n
invariant pa != null ==> pa.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant m >= n
decreases ListRepr(pa)
{
pa := pa.next;
m := m - 1;
}

assert m <= n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
} else if n > m {
while n > m
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant n >= m
decreases ListRepr(pb)
{
pb := pb.next;
n := n - 1;
}

assert n <= m;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
}``````

We keep the lengths of the lists in `m` and `n` respectively, and use `pa` and `pb` to iterate over them. I won’t show the definition of `Length` here, but it’s what you’d expect.

I’ve had to add an explicit `if` around the two loops to convince Dafny that the loop invariants hold on entry in the loops. To be honest I think that’s reasonable, and it makes the algorithm easier to understand at the cost of an extra check.

The loop invariants make sure `m` and `n` stay in sync with the lengths of the lists as `pa` and `pb`. The important point here is that call to `Valid` together with the `decreases ListRepr(...)` clause allow Dafny to prove that the list makes progress at each step, and eventually terminates. I’m don’t know about you but I find this pretty cool.

The assertions are not strictly necessary, but I like to have sanity checks thrown in.

On to the main loop. Before we enter the loop we have m ≤ n ∧ n ≤ m. Therefore m = n. They’re also both in sync with length of the lists starting in `pa` and `pb`.

``````  // Find the intersection point

assert m == ElemCount(pa) && n == ElemCount(pb) && m == n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);

while pa != pb
invariant pa != null ==> pa.Valid()
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb) && m == n
decreases ListRepr(pa), ListRepr(pb)
{
pa := pa.next;
pb := pb.next;
m := m - 1;
n := n - 1;
}

assert pa == pb;

r := pa;``````

The loop invariants make sure that `m` and `n` stay in sync with the lists as well as with each other. This is why we don’t have to explicitly check `pa` and `pb` for `null` in the loop test.

And with that we’re done. I’m still not 100% happy with the postcondition on the algorithm. I think I would come up with a stronger postcondition if I kept track of the pointers as sequences instead of sets, like in the case of `Elems`, and check that there’s no earlier node that the sequences have in common before the result. Anyway, this post is long enough as it is.

I’ve been meaning to play with theorem proving for a while, and this was a nice excuse to do so. It was a mostly enjoyable exercise, and Dafny seems like a nice language to start exploring the field.

1. In logic expressions the operators have the same meaning as in C. `|...|` stands for the length of a collection of things, and `==>` for logical implication.↩︎

2. Actually each node has its own set that is a subset of the one in the node pointing to it. Kind of like Russian nesting dolls.↩︎