February 18, 2023

**February 28, 2023:** @tgehr posted
a zero-based version of the algorithm. They also pointed out that my
original code passed the static arrays by value. I’ve updated the code
in the post to use `ref`

for the array parameters as well as
D’s exponentiation operator, `^^`

. @tgehr’s version also returns the first
postiion of the element if there are multiple instances, while the one
in this post returns the last one. Thanks for the suggestions!

**February 18, 2023:** @schveiguy pointed
out that the initial approach to determining the static array size
was overly complicated. I’ve updated the code to incorporate their
suggestion. Thanks!

Recently, while I was reading “Writing Efficient Programs”, I came across a beautiful variation of binary search and its implementation. In this post I’ll be using D’s metaprogramming capabilities to implement a generic version of the algorithm.

For the rest of the post I’m going to assume you’re familiar with binary search, and its most common implementation–the bisection method using two pointers.

There are many ways to implement a binary search. The most common is probably using a “bisection search”, where we track the subintervals using two pointers, one moving from left to right, the other from right to left.

Another variant is the “uniform search”, where instead of using two pointers, we use only a single pointer and a “rate of change”, i.e., the start and the size of the subintervals. This is more subtle than bisection search–which is not trivial by the way. In Knuth’s words:

It is possible to do this, but only if extreme care is paid to the details, […]. Simpler approaches are doomed to failure.

On the other hand, uniform search has some advantages. One of them is that the the rates of change can be precalculated, and stored in a side table. If we get the rate of change calculation right–which is the subtle part–the search algorithm is simpler than its cousin using two pointers.

A variation of uniform search is Shar’s algorithm. It does away with the side table, and uses power of two interval sizes.

It starts by comparing the the key we are looking for, *K*, with *K*_{i}, where *i* = 2^{k}, *k* = ⌊*l**g**N*⌋.
If *K* < *K*_{i},
the interval sizes are 2^{k − 1}, 2^{k − 2}, …, 1, 0. But if *K* > *K*_{i},
we set *i* = *N* − 2^{l} + 1,
where *l* = ⌈*l**g*(*N*−2^{k}+1)⌉,
adjust the interval pointer, and the interval sizes are 2^{l − 1}, 2^{l − 2}, …, 1, 0.

Shar’s algorithm determines the position of the entry with key *K* bit by bit. Each test adds one
more bit. We need one more test, after we’ve determined all the bits, to
see if the entry is actually in the table.

Bentley provides a beautiful version of Shar’s algorithm in his book. The code works for tables with 1000 elements. The code in the book is written in Pascal, but transliterated to D it looks like this:

```
int bsearch1000(ref int[1001] xs, int x)
{
auto i = 0;
if (xs[512] <= x) { i = 1000 - 512 + 1; }
if (xs[i + 256] <= x) { i += 256; }
if (xs[i + 128] <= x) { i += 128; }
if (xs[i + 64] <= x) { i += 64; }
if (xs[i + 32] <= x) { i += 32; }
if (xs[i + 16] <= x) { i += 16; }
if (xs[i + 8] <= x) { i += 8; }
if (xs[i + 4] <= x) { i += 4; }
if (xs[i + 2] <= x) { i += 2; }
if (xs[i + 1] <= x) { i += 1; }
if (xs[i] == x) return i;
else return 0;
}
```

There’s a few things going on here. First, the odd array size, 1001.
Pascal arrays are one-based. D follows in C’s footsteps with zero-based
arrays. We just ignore `xs[0]`

in this case. This is a bug by
the way. Bentley acknowledges it, but he doesn’t provide a fix since he
considers it would detract from the exposition. We can fix it by setting
`i`

to `1`

initially, and making the necessary
code adjustments. This would complicate the code somewhat. Another way
to fix it by explicitly checking if `i`

is `0`

in
the last test.

Second, the code fully unrolls the search loop. This is only possible because we know the size of the table beforehand. The code can be adjusted for other table sizes as needed.

What makes this code beautiful is that it’s about as efficient as it could be. It’s also uniform, and relatively easy to understand if you know about Shar’s algorithm. It’s an almost word for word rendition of the algorithm tailored for this particular table size.

Beautiful as it is, Bentley’s code is somewhat tedious to adjust to
other table sizes, and it’s easy to make mistakes while calculating the
initial value of `i`

. The code is very repetitive and
uniform. This is a strong hint that we can automate writing it.

This is where D’s powerful metaprogramming capabilities come in. If we can determine the size of the table at compile time, we could in principle generate the code for the unrolled loop automatically.

As it turns out, we can determine if we’re dealing with a static array, and get its size at compile time. Before I show you the code, let’s break the problem down. The algorithm has three parts:

The initial test, comparing the search key

*K*with*K*_{i}for*i*= 2^{k},*k*= ⌊*l**g**N*⌋;Determining successive bits of the candidate position by iterating over powers of two; and

Checking if the we found the element we are looking for.

Let’s start with the function signature:

```
auto bsearch(T, size_t n)(ref T[n] xs, T x)
{
```

This is really neat. I didn’t expect the D compiler to deduce
`n`

–C++ won’t– but it worked just fine. `T`

is the
array element type, and `n`

is the array’s length. This is a
static value, available at compile time.

Then we determine *k*, the
power of two where we start the search:

`enum k = iflog2(n - 1); `

`iflog2`

stands for *i*nteger *f*loor
*log*arithm in base *2*. It’s a regular D function, but it
can be evaluated at compile time when called with a compile time value,
which is what we do here. Using an `enum`

is very similar to
C++. Since our array is one-base we subtract one from its length.

The function parameters are the table `xs`

and the key
`x`

we are looking for.^{1} We’re passing
`xs`

by `ref`

so we don’t pass the whole array
when we call `basearch`

.

The initial test is just the code rendition of the test in Shar’s algorithm:

```
auto p = 0;
if (xs[2 ^^ k] <= x) p = (n - 1) - 2 ^^ k + 1;
```

We track the position of the candidate element in `p`

.

Now for the fun bit, generating the power of two tests:

```
static foreach_reverse (int i; 0 .. k)
{
if (xs[p + 2 ^^ i] <= x) p += 2 ^^ i;
}
```

This code is remarkably short thanks to the problem’s regularity we
mentioned earlier, and to D’s powerful metaprogramming capabilities. A
`static foreach`

is evaluated at compile time. And crucially,
it doesn’t introduce a new scope. The code inside is just “spliced” in
the code of the surrounding function. In effect, this snippet generates
a series of `if`

statements equivalent to the one in
`bsearch1000`

. We use `foreach_reverse`

to iterate
over the the exponents *k* to
1–the range `0 .. k`

is
open, and we’re iterating over it in reverse. The choice of
`foreach_reverse`

as a keyword is somewhat unfortunate. There
may be a cleaner way of achieving the same result, but I don’t use D
regularly, so this is what I came up with :^).

Once we’ve determined all the bits of the candidate element position
`p`

all that’s left to do is to check if the element we’re
looking for is at that position.

```
if (p > 0 && xs[p] == x) return p;
return 0;
}
```

And with this we’re done. If we check the code generated for
`bsearch1000`

and `bsearch`

on the
compiler explorer we see that’s it’s virtually the same up. The D
compiler even inlined the compile time constants for us.

One shortcoming of our solution is that it only works for static arrays. It would be nice if we could detect that we’re dealing with a dynamic array, and fall back to another algorithm.

You probably won’t be surprised to find out that we can–otherwise I wouldn’t be writing this section :^).

The whole implementation is a bit anticlimatic really. We can just
use function overloading, and declare a version of `bsearch`

that takes a dynamic array parameter:

```
auto bsearch(T)(T[] xs, T x)
{
("dynamic array: ", xs);
writeln// ...
return 0;
}
```

The compiler will choose the right overload based on the arguments we call the function with.

Shar’s algorithm is a beautiful variation of binary search. If we know the table size in advance we can write an efficient binary search algorithm using it.

Bentley’s implementation is also beautiful because it squeezes every bit of performance from the code. The code does no more than it should, and there is a Zen kind of beauty to it.

D’s metaprogramming capabilities allowed us to take Bentley’s beautiful code, and make it more general. We can generate the code for any table size if we know it at compile time. Otherwise we fall back to an algorithm that can deal with dynamic arrays. Andrei Alexandrescu calls this technique “Design by Introspection”.

D is a fun language to program in. It shines for this kind of
problems, where you can make heavy use of metaprogramming. I haven’t
used any other language where this problem could be expressed as
cleanly, except maybe Common Lisp. I’d be curious to see a solution in
Rust (which has a strong macro system) and ~~Zig (which I read has
strong support for metaprogramming as well)~~. Turns out someone implemented
the algorithm in Zig.

In the example in this post the key and the element are the same, they are integers.↩︎