Beautiful Binary Search in D

Alex Muscar

February 18, 2023

February 28, 2023: @tgehr posted a zero-based version of the algorithm. They also pointed out that my original code passed the static arrays by value. I’ve updated the code in the post to use ref for the array parameters as well as D’s exponentiation operator, ^^. @tgehr’s version also returns the first postiion of the element if there are multiple instances, while the one in this post returns the last one. Thanks for the suggestions!

February 18, 2023: @schveiguy pointed out that the initial approach to determining the static array size was overly complicated. I’ve updated the code to incorporate their suggestion. Thanks!

Recently, while I was reading “Writing Efficient Programs”, I came across a beautiful variation of binary search and its implementation. In this post I’ll be using D’s metaprogramming capabilities to implement a generic version of the algorithm.

For the rest of the post I’m going to assume you’re familiar with binary search, and its most common implementation–the bisection method using two pointers.

A beautiful algorithm

There are many ways to implement a binary search. The most common is probably using a “bisection search”, where we track the subintervals using two pointers, one moving from left to right, the other from right to left.

Another variant is the “uniform search”, where instead of using two pointers, we use only a single pointer and a “rate of change”, i.e., the start and the size of the subintervals. This is more subtle than bisection search–which is not trivial by the way. In Knuth’s words:

It is possible to do this, but only if extreme care is paid to the details, […]. Simpler approaches are doomed to failure.

On the other hand, uniform search has some advantages. One of them is that the the rates of change can be precalculated, and stored in a side table. If we get the rate of change calculation right–which is the subtle part–the search algorithm is simpler than its cousin using two pointers.

A variation of uniform search is Shar’s algorithm. It does away with the side table, and uses power of two interval sizes.

It starts by comparing the the key we are looking for, K, with Ki, where i = 2k, k = ⌊lgN. If K < Ki, the interval sizes are 2k − 1, 2k − 2, , 1, 0. But if K > Ki, we set i = N − 2l + 1, where l = ⌈lg(N−2k+1)⌉, adjust the interval pointer, and the interval sizes are 2l − 1, 2l − 2, , 1, 0.

Shar’s algorithm determines the position of the entry with key K bit by bit. Each test adds one more bit. We need one more test, after we’ve determined all the bits, to see if the entry is actually in the table.

A beautiful implementation

Bentley provides a beautiful version of Shar’s algorithm in his book. The code works for tables with 1000 elements. The code in the book is written in Pascal, but transliterated to D it looks like this:

int bsearch1000(ref int[1001] xs, int x)
    auto i = 0;
    if (xs[512] <= x) { i = 1000 - 512 + 1; }
    if (xs[i + 256] <= x) { i += 256; }
    if (xs[i + 128] <= x) { i += 128; }
    if (xs[i + 64] <= x) { i += 64; }
    if (xs[i + 32] <= x) { i += 32; }
    if (xs[i + 16] <= x) { i += 16; }
    if (xs[i + 8] <= x) { i += 8; }
    if (xs[i + 4] <= x) { i += 4; }
    if (xs[i + 2] <= x) { i += 2; }
    if (xs[i + 1] <= x) { i += 1; }

    if (xs[i] == x) return i;
    else return 0;

There’s a few things going on here. First, the odd array size, 1001. Pascal arrays are one-based. D follows in C’s footsteps with zero-based arrays. We just ignore xs[0] in this case. This is a bug by the way. Bentley acknowledges it, but he doesn’t provide a fix since he considers it would detract from the exposition. We can fix it by setting i to 1 initially, and making the necessary code adjustments. This would complicate the code somewhat. Another way to fix it by explicitly checking if i is 0 in the last test.

Second, the code fully unrolls the search loop. This is only possible because we know the size of the table beforehand. The code can be adjusted for other table sizes as needed.

What makes this code beautiful is that it’s about as efficient as it could be. It’s also uniform, and relatively easy to understand if you know about Shar’s algorithm. It’s an almost word for word rendition of the algorithm tailored for this particular table size.

Beautiful metaprogramming

Beautiful as it is, Bentley’s code is somewhat tedious to adjust to other table sizes, and it’s easy to make mistakes while calculating the initial value of i. The code is very repetitive and uniform. This is a strong hint that we can automate writing it.

This is where D’s powerful metaprogramming capabilities come in. If we can determine the size of the table at compile time, we could in principle generate the code for the unrolled loop automatically.

As it turns out, we can determine if we’re dealing with a static array, and get its size at compile time. Before I show you the code, let’s break the problem down. The algorithm has three parts:

  1. The initial test, comparing the search key K with Ki for i = 2k, k = ⌊lgN;

  2. Determining successive bits of the candidate position by iterating over powers of two; and

  3. Checking if the we found the element we are looking for.

Let’s start with the function signature:

auto bsearch(T, size_t n)(ref T[n] xs, T x)

This is really neat. I didn’t expect the D compiler to deduce n–C++ won’t– but it worked just fine. T is the array element type, and n is the array’s length. This is a static value, available at compile time.

Then we determine k, the power of two where we start the search:

     k = iflog2(n - 1);

iflog2 stands for integer floor logarithm in base 2. It’s a regular D function, but it can be evaluated at compile time when called with a compile time value, which is what we do here. Using an enum is very similar to C++. Since our array is one-base we subtract one from its length.

The function parameters are the table xs and the key x we are looking for.1 We’re passing xs by ref so we don’t pass the whole array when we call basearch.

The initial test is just the code rendition of the test in Shar’s algorithm:

    auto p = 0;

    if (xs[2 ^^ k] <= x) p = (n - 1) - 2 ^^ k + 1;

We track the position of the candidate element in p.

Now for the fun bit, generating the power of two tests:

    static foreach_reverse (int i; 0 .. k)
        if (xs[p + 2 ^^ i] <= x) p += 2 ^^ i;

This code is remarkably short thanks to the problem’s regularity we mentioned earlier, and to D’s powerful metaprogramming capabilities. A static foreach is evaluated at compile time. And crucially, it doesn’t introduce a new scope. The code inside is just “spliced” in the code of the surrounding function. In effect, this snippet generates a series of if statements equivalent to the one in bsearch1000. We use foreach_reverse to iterate over the the exponents k to 1–the range 0 .. k is open, and we’re iterating over it in reverse. The choice of foreach_reverse as a keyword is somewhat unfortunate. There may be a cleaner way of achieving the same result, but I don’t use D regularly, so this is what I came up with :^).

Once we’ve determined all the bits of the candidate element position p all that’s left to do is to check if the element we’re looking for is at that position.

    if (p > 0 && xs[p] == x) return p;
    return 0;

And with this we’re done. If we check the code generated for bsearch1000 and bsearch on the compiler explorer we see that’s it’s virtually the same up. The D compiler even inlined the compile time constants for us.

Bonus: making it truly generic

One shortcoming of our solution is that it only works for static arrays. It would be nice if we could detect that we’re dealing with a dynamic array, and fall back to another algorithm.

You probably won’t be surprised to find out that we can–otherwise I wouldn’t be writing this section :^).

The whole implementation is a bit anticlimatic really. We can just use function overloading, and declare a version of bsearch that takes a dynamic array parameter:

auto bsearch(T)(T[] xs, T x)
    writeln("dynamic array: ", xs);
    // ...
    return 0;

The compiler will choose the right overload based on the arguments we call the function with.


Shar’s algorithm is a beautiful variation of binary search. If we know the table size in advance we can write an efficient binary search algorithm using it.

Bentley’s implementation is also beautiful because it squeezes every bit of performance from the code. The code does no more than it should, and there is a Zen kind of beauty to it.

D’s metaprogramming capabilities allowed us to take Bentley’s beautiful code, and make it more general. We can generate the code for any table size if we know it at compile time. Otherwise we fall back to an algorithm that can deal with dynamic arrays. Andrei Alexandrescu calls this technique “Design by Introspection”.

D is a fun language to program in. It shines for this kind of problems, where you can make heavy use of metaprogramming. I haven’t used any other language where this problem could be expressed as cleanly, except maybe Common Lisp. I’d be curious to see a solution in Rust (which has a strong macro system) and Zig (which I read has strong support for metaprogramming as well). Turns out someone implemented the algorithm in Zig.

  1. In the example in this post the key and the element are the same, they are integers.↩︎