July 2, 2022

Okay, this post has been a long time coming. Initially I meant it to cover a generalisation of the the algorithm we developed last time. The plan was to do a hand wavy proof of correctness, and take hints from the proof process to come up with a more generic version of the algorithm.

That’s still the plan, but I wasn’t happy with the hand wavy part. So I decided to do a proper proof. But if something’s worth doing it’s definitely worth overdoing. Right? And that’s how I ended up writing a verified version of the algorithm in Dafny.

On to the proofy bit!

The algorithm works on singly linked lists implemented using C++
pointers. Working with mutable, dynamically allocated data structures
complicates the proof significantly. Since I don’t want to go down the
Separation
Logic rabbit hole we’ll make the following simplification: we treat
the linked lists as *finite sequences*.

It turns out Dafny uses a similar approach with an extra twist to ensure the lists aren’t circular.

The `Node`

class is what you’d expect–it has a value and a
next pointer–with added machinery to aid proofs and a convenience method
for constructing lists which I’ve omitted. I’ll get to them shortly.

```
class Node<T(0)>
{
var value: T;
var next: Node?<T>;
// Proof aides
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
// Pre- and postcondition
{
// Implementation
}
}
```

If you’re wondering about the `(0)`

bit after the type
parameter, that tells Dafny that whatever type we supply it must be
“default constructible” in C++ parlance.

We’ll get to the implementation of `Cons`

in a bit, but
first let’s look at how Dafny deals with mutable linked lists. Since
linked lists are a recursive data structures our algorithms will have to
use either loops or recursion. In either case Dafny will have trouble
proving that the process terminates because we might be dealing with a
circular list. The basic approach to solving this difficulty is to keep
track of what nodes go where, and prove that a node can’t appear in the
list starting at its `next`

pointer. The idiom in Dafny seems
to be to define a *predicate* called `Valid`

together with some *ghost* fields to track the proof state for
the linked list. Unlike methods and regular fiedls, predicates and ghost
fields are only used for verification^{1}.

```
// Proof aides
ghost var Elems: seq<T>;
ghost var Repr: set<Node<T>>;
predicate Valid()
reads this, Repr
{
this in Repr &&
|Elems| > 0 && Elems[0] == value &&
(next == null ==> |Elems| == 1) &&
(next != null ==>
next in Repr && next.Repr <= Repr && this !in next.Repr &&
next.Valid() && next.Elems == Elems[1..])
}
```

`Elems`

tracks the elements of the list. `Repr`

is more interesting, and it’s the key ingredient for allowing us to work
with linked lists. It’s a set tracking the nodes in the representation
of the list^{2}. `Valid`

uses it to check
that a node doesn’t appear in the list that starts from it, that is,
that it’s not circular. It turns out this is enough to allow Dafny to
work out termination for loops involving linked lists made of
`Node`

s.

Now we can finally look at `Cons`

which is pretty much
what you’d expect plus some proof state bookeeping.

```
static method Cons(x: T, xs: Node?<T>) returns (n: Node<T>)
requires xs == null || xs.Valid()
ensures n.Valid()
ensures if xs == null then n.Elems == [x] else n.Elems == [x] + xs.Elems
{
n := new Node;
n.value, n.next := x, xs;
if xs == null
{
n.Elems := [x];
n.Repr := {n};
} else {
n.Elems := [x] + xs.Elems;
n.Repr := {n} + xs.Repr;
}
}
```

The precondition–introduced by `requires`

–asks for a valid
node, and the postconditions–introduced by `ensures`

–make
sure we leave the node in a valid state and that we’ve actually
prepended `x`

to `xs`

. The actual logic is the
first two lines. The `if`

keeps the proof state valid.

We’re now have all the ingredients to write and prove the intersection algortihm. The algorithm is similar to the one last time, but I’ve had to make a few adjustments to make Dafny happy.

Let’s start by looking at the pre- and post conditions.

```
method Intersects<T(0)>(a: Node?<T>, b: Node?<T>) returns (r: Node?<T>)
requires a == null || a.Valid()
requires b == null || b.Valid()
ensures a == null || b == null ==> r == null
ensures r == null || (r in ListRepr(a) && r in ListRepr(b))
{
// "Synchronise" lists heads
// Find the intersection point
}
```

The preconditions are fairly light, we want the lists to be valid. As mentioned above, this is key allowig Dafny to derive termination proofs for the inner loops.

By the way, the `?`

after `Node`

in the type
signatures means it’s a “nullable” type. That means we have to deal with
`null`

all over the place, but it models the original
algorithm better.

The postconditions say if either of the inputs is `null`

so will be the reult–fair enough, if any or both lists are empty they
have no intersection point–and that if the result is not
`null`

–we found an intersection point–it’s present in both
lists.

Now these may sound like reasonable postconditions, but the last one
is not as strong as it cold be. It would be better if we made sure the
intersection point is the *first* common node between the two
instead of a common node. It looks like it would take a bit more work to
convince Dafny to verify that postcondition so I decided to publish the
proof as it is. I might revisit this at some point in the future.

OK, it’s time to have a look at the first part of the algorithm where we “synchronise” the list heads so we can then walk them in lockstep.

```
// "Synchronise" lists heads
var m: nat := Length(a);
var n: nat := Length(b);
var pa, pb := a, b;
if m > n
{
while m > n
invariant pa != null ==> pa.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant m >= n
decreases ListRepr(pa)
{
pa := pa.next;
m := m - 1;
}
assert m <= n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
} else if n > m {
while n > m
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb)
invariant n >= m
decreases ListRepr(pb)
{
pb := pb.next;
n := n - 1;
}
assert n <= m;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
}
```

We keep the lengths of the lists in `m`

and `n`

respectively, and use `pa`

and `pb`

to iterate
over them. I won’t show the definition of `Length`

here, but
it’s what you’d expect.

I’ve had to add an explicit `if`

around the two loops to
convince Dafny that the loop invariants hold on entry in the loops. To
be honest I think that’s reasonable, and it makes the algorithm easier
to understand at the cost of an extra check.

The loop invariants make sure `m`

and `n`

stay
in sync with the lengths of the lists as `pa`

and
`pb`

. The important point here is that call to
`Valid`

together with the
`decreases ListRepr(...)`

clause allow Dafny to prove that
the list makes progress at each step, and eventually terminates. I’m
don’t know about you but I find this pretty cool.

The assertions are not strictly necessary, but I like to have sanity checks thrown in.

On to the main loop. Before we enter the loop we have *m* ≤ *n* ∧ *n* ≤ *m*.
Therefore *m* = *n*.
They’re also both in sync with length of the lists starting in
`pa`

and `pb`

.

```
// Find the intersection point
assert m == ElemCount(pa) && n == ElemCount(pb) && m == n;
assert pa != null ==> pa in ListRepr(a);
assert pb != null ==> pb in ListRepr(b);
while pa != pb
invariant pa != null ==> pa.Valid()
invariant pb != null ==> pb.Valid()
invariant m == ElemCount(pa) && n == ElemCount(pb) && m == n
decreases ListRepr(pa), ListRepr(pb)
{
pa := pa.next;
pb := pb.next;
m := m - 1;
n := n - 1;
}
assert pa == pb;
r := pa;
```

The loop invariants make sure that `m`

and `n`

stay in sync with the lists as well as with each other. This is why we
don’t have to explicitly check `pa`

and `pb`

for
`null`

in the loop test.

And with that we’re done. I’m still not 100% happy with the
postcondition on the algorithm. I think I would come up with a stronger
postcondition if I kept track of the pointers as sequences instead of
sets, like in the case of `Elems`

, and check that there’s no
earlier node that the sequences have in common before the result.
Anyway, this post is long enough as it is.

I’ve been meaning to play with theorem proving for a while, and this was a nice excuse to do so. It was a mostly enjoyable exercise, and Dafny seems like a nice language to start exploring the field.

May 21, 2022

This was meant to be a single post outlining an approach for determining the intersection point of two singly-linked lists in constant space and its generalisation to work with other data structures. Since the result would be a relatively lengthy read I’ll publish the first part covering the linked list algorithm in this post, and I’ll leave the generalisation for a future post.

**May 24, 2022:** Laurențiu pointed out an error in the
original version of the C++ implementation. The last loop was comparing
the node values instead of the node identities. I’ve updated the
implementation accordingly.

A while back a friend shared an
interesting problem with me^{1}: given the heads of two
singly-linked lists, it asks for an algorithm to find the node at which
they intersect. We can assume the input lists don’t have cycles. The
linked lists must retain their structure after the algorithm is
finished.

One obvious solution is to store the first list’s nodes in a set or similar data structure and check if any of the nodes in the second list are present in it.

To make things more interesting my friend also asked if this can be done in constant space.

If the lists were the same length this would be easy to solve in constant space: just walk over the two lists at the same time and compare the elements. Since our lists can be any length we can’t use this approach.

What we do know is that our lists don’t have loops^{2}.
Can we can take advantage of this observation in any way? Since the
lists don’t have loops they have an end. What if we walked the lists
backwards comparing each element? That would work, but walking the lists
backwards is awkward because they are singly-linked. The constraint on
space rules out any additional data structures where we could store info
abot the reverse lists. If only we had doubly linked lists.

Hm, maybe there’s a way forward: it’s not a doubly-linked list we really want, but the back pointers. What if we traversed each list reversing pointers along the way? We’d be left with two singly linked lists that we can then traverse to compare elements. Looks like we have a solution.

But the constraints ask for the structure of the lists to be left unchanged after the algorithm finishes. So we’d have to walk the lists again reversing pointers to their original direction. This works, but it’s getting complicated. Not to mention that we have to traverse each list 3 times. Can we come up with a simpler solution?

Why did we want to start at the end of the lists? Because the lists have different lengths so we don’t know how many nodes to skip from each until they get “in sync”, and we can start comparing nodes.

There’s an asumption we’re making here–and I think the original problem makes the same assumption implicitly. Namely, that the two lists, once they intersect, have the same elements. That’s an important assumption, and it’s what allows the approach outlined above to work. If the lists just intersected in one element, and then went their separate ways we’d face the same problem as when we wanted to start looking from the beginning of the lists. Keep in mind that I’m making this assumption moving forward.

Going back to walking the lists in reverse, we can look at it in a slightly different light: we’re walking a list that branches into two lists at some element. That’s the element we want to find. We don’t care about the nodes that come after. That’s a useful insight.

Our initial problem was that we didn’t know when the lists would get in sync so we can start comparing elements. But using our insight, it seems safe to just skip elements from the lists until each has an equal number of elements, that is, until we’re at the same distance from the split point on both branches. Then we can just compare the elements, and we’re done. Note that this approach still works when the two lists don’t intersect (why?).

Given two pointers *A* and
*B* to singly-linked lists, the
process we described can be implemented by the following algorithm
agorithm:

- [Determine list lengths.] Set
*M*←*l**e**n**g**t**h*(*A*), and*N*←*l**e**n**g**t**h*(*B*). - [Sync A’s starting point with B.] If
*M*>*N*, then set*A*←*n**e**x**t*(*A*),*M*←*M*− 1 and repeat this step. - [Sync B’s starting point with A.] If
*N*>*M*, then set*B*←*n**e**x**t*(*B*),*N*←*N*− 1 and repeat this step. - [Search for first common node in lock-step] If
*A*≠*N**U**L**L*∧*B*≠*N**U**L**L*∧*A*≠*B*set*A*←*n**e**x**t*(*A*),*B*←*n**e**x**t*(*B*) and repeat this step. - [Done.] Return
*A*.

(Why does it just work to return *A*?)

It’s finally time to write some code. Since the focus is on the
intersection point algorithm we’ll define the simplest `node`

structure we can work with:

```
template<typename T>
struct node
{
;
T value* next;
node};
```

Yes, it uses raw pointers which goes against modern C++ best practices. This is not production code.

Determining the length of a list is a straightforward task:

```
template<typename T>
int length(node<T>* a)
{
int n = 0;
while (a != nullptr) {
++;
n= a->next;
a }
return n;
}
```

With the boilerplate out of the way, the algorithm can be implemented as follows:

```
template<typename T>
<T>* intersection_point(node<T>* a, node<T>* b)
node{
auto m = length(a);
auto n = length(b);
while (m > n) {
= a->next;
a --;
m}
while (n > m) {
= b->next;
b --;
n}
while (a != nullptr && b != nullptr && a != b) {
= a->next;
a = b->next;
b }
return a;
}
```

A previous version of the post linked to the problem on a competitive coding site. Since then I’ve removed the link.↩︎

There is another implicit assumption that the lists are finite. Since computers have a limited amount of memory this might sound pedantic, but it’s going to be important in the next post.↩︎

January 30, 2022

Recently, a friend wrote about self-modifying executables in Rust. I was curious if I can the same thing in Go. The short answer is “yes”. Read on if you’re curious how I did it.

Let’s start with a skeleton program, and we’ll add the patching code later:

```
package main
import (
"fmt"
)
var CNT = 0xCAFEBABE
func main() {
.Println(CNT)
fmt// Patch the binary to increment CNT
}
```

We’ll poke around the ELF file form the command line, so I
initialised `CNT`

to an easy to find value.

The Rust program uses the `link_section`

attribute to give
the counter variable a section all of its own. That’s quite neat, but Go
doesn’t have anything like that as far as I can tell. While having the
variable live in a dedicated section makes it easier to find it, it’s
not essential to the solution. After all the program must find it
somehow. And the way it does it is by going through the *symbol
table*.

Let’s use the `strings`

program to find `CNT`

’s
full name in the binary:

```
$ strings self-modify | grep CNT
stack=[cgocheckdeadlockmain.CNTno anodepollDescrunnablerwmutexRrwmutexWscavengetraceBufunknown( (forced) -> node= B exp.) B work ( blocked= in use)
main.CNT
runtime.x86HasPOPCNT
```

OK, we’re looking for the symbol named `main.CNT`

.
`readelf`

is a handy tool for analysing ELF binaries. We can
list the entries in our executable’s symbol table, and have a look at
`main.CNT`

:

```
$ readelf -s self-modify
Num: Value Size Type Bind Vis Ndx Name
...
1706: 00000000005472a8 8 OBJECT GLOBAL DEFAULT 9 main.CNT
...
```

The entry holds all the details we need to find `CNT`

in
the binary: the section index (`Ndx`

), the variable size, and
the location for its storage (`Value`

). One thing to keep in
mind is that the location is not an offset in the executable file, but a
*virtual address*.

Using `readelf`

we can inspect section 9:

```
$ readelf --sections self-modify
There are 23 section headers, starting at offset 0x1c8:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[ 9] .noptrdata PROGBITS 00000000005471c0 001471c0
0000000000010a60 0000000000000000 WA 0 0 32
```

The object is placed in the `.noptrdata`

section which is
a Go specific section. But we’re not really bothered by that. What we
care about it the `Address`

field. That’s the virtual address
of the section. Using the symbol table entry’s virtual address and the
section virtual address we can work out our object’s offset in the
section:

```
$ python3 -c 'print(0x00000000005472a8 - 0x00000000005471c0)'
232
```

The final step is to find the object’s physical location. This is where we use the section offset. We add the object offset to the section offset to find out where the storage for our counter is in the executable. Let’s check our logic:

```
$ python3 -c 'print(0x001471c0 + 232)'
1340072
$ hexdump -s 1340072 -n 8 self-modify
01472a8 babe cafe 0000 0000
```

Cool, it works. Now we can write some code to do the patching.

The full program is just the steps we ran above in the shell
translated to Go (plus error handling, more on that later). Go’s
standard library comes with a debug/elf module which makes
reading ELF files almost as convenient as using
`readelf`

.

```
package main
import (
"bytes"
"debug/elf"
"fmt"
"io"
"log"
"os"
)
var CNT = 1
type entry struct {
, off uint64
value}
func getEntry(f *elf.File, name string) (*entry, error) {
, err := f.Symbols()
symsif err != nil {
return nil, err
}
for _, s := range syms {
if s.Name == name {
:= f.Sections[s.Section]
sect , _ := sect.Data()
bs:= s.Value - sect.Addr
varOff return &entry{f.ByteOrder.Uint64(bs[varOff : varOff+s.Size]), sect.Offset + varOff}, nil
}
}
return nil, fmt.Errorf("can't find symbol '%s'", name)
}
func main() {
.Println(CNT)
fmt
// Patch the binary to increment CNT
:= os.Args[0]
exeName := exeName + ".tmp"
tmpName
, err := os.Open(exeName)
fif err != nil {
.Fatalf("can't open file '%s': %e", exeName, err)
log}
defer f.Close()
, _ := io.ReadAll(f)
data, err := elf.NewFile(bytes.NewReader(data))
elfFileif err != nil {
.Fatalf("can't read ELF file: %e", err)
log}
, err := getEntry(elfFile, "main.CNT")
entryif err != nil {
.Fatalf("can't find counter object in ELF file: %e", err)
log}
.ByteOrder.PutUint64(data[entry.off:], entry.value+1)
elfFile
, err := f.Stat()
fiif err != nil {
.Fatalf("can't get file mode for '%s': %e", os.Args[0], err)
log}
if err := os.WriteFile(tmpName, data, fi.Mode()); err != nil {
.Fatalf("can't write file '%s': %e", tmpName, err)
log}
if err := os.Rename(tmpName, os.Args[0]); err != nil {
.Fatalln("can't rename temporary file", err)
log}
}
```

The most obvious caveat is that this only works for ELF files on Linux (it might work on other operating systems using ELF, but I haven’t tested it).

The code also assumes that the compiler will always reserve storage
for the variable in the object file. At the time of writing, under go
1.17, that’s the case. And since this is an exported variable the
compiler won’t inline it. *But* this is the only module. And the
variable is not used anywhere else. If in a later version the compiler
starts doing some clever cross module analysis it may well decide to
inline the value in the one place it’s used. Fun exercies: try changing
the `var`

to a `const`

and see what happens.

Why did we initialise `CNT`

to `1`

and not
`0`

? If we did initialise it to `0`

the compiler
would put it in the `.bss`

which is treated specially by the
program loader. The executable only stores the length of the segment,
not the data. The loader allocates the data when it loads the program.
That means there’s nothing for us to patch.

I do like explicit error handling, but even so, I find the above code
verbose to the point of making the main logic hard to follow. You could
argue that the steps should be encapsulated in functions with the
appropriate error handling. And that’s true. But even so you’d end up
with a handful of `mustX`

functions, which seems to be a
common pattern in Go.

Instead, I decided to give the Go 1.18 beta1 with generics a go. The
program below defines two generic helpers–`mustDo`

and
`must`

. The former is meant to wrap functions that are only
performed for their side effects, and only return an `error`

object. This can actually be written in Go 1.17 since it doesn’t use any
generic parameters. `must`

on the other hand is meant to wrap
functions that return a value and an `error`

object. These
make the code considerably easier to read. If you’re not familiar with
Go’s idioms, `log.Fatalln`

is equivalent to calling
`log.Println`

followed by a call to
`os.Exit(1)`

.

```
package main
import (
"bytes"
"debug/elf"
"fmt"
"log"
"io"
"os"
)
var CNT = 1
type entry struct {
, off uint64
value}
func mustDo(err error) {
if err != nil {
.Fatalln(err)
log}
}
func must[T any](t T, err error) T {
if err != nil {
.Fatalln(err)
log}
return t
}
func getEntry(f *elf.File, name string) (*entry, error) {
:= must(f.Symbols())
syms for _, s := range syms {
if s.Name == name {
:= f.Sections[s.Section]
sect , _ := sect.Data()
bs:= s.Value - sect.Addr
varOff return &entry{f.ByteOrder.Uint64(bs[varOff : varOff+s.Size]), sect.Offset + varOff}, nil
}
}
return nil, fmt.Errorf("can't find symbol '%s'", name)
}
func main() {
.Println(CNT)
fmt
:= os.Args[0]
exeName := exeName + ".tmp"
tmpName
:= must(os.Open(exeName))
f defer f.Close()
, _ := io.ReadAll(f)
data:= must(elf.NewFile(bytes.NewReader(data)))
elfFile
:= must(getEntry(elfFile, "main.CNT"))
entry .ByteOrder.PutUint64(data[entry.off:], entry.value+1)
elfFile
:= must(f.Stat())
fi (os.WriteFile(tmpName, data, fi.Mode()))
mustDo(os.Rename(tmpName, os.Args[0]))
mustDo}
```