X macros in D

Alex Muscar

February 25, 2023

Recently, while migrating a project from C to D–I’ll write about it at some point–I found myself missing X macros. As it turns out, D has a pretty neat solution for this problem. And yes, it involves metaprogramming :^).

X macros

If you’re not familiar with X macros you can read more about them in the article I linked, but here’s a quick introduction.

X macros solve the problem of keeping in sync related entities in a program. In my case it’s instructions for an intermediate representation (IR) in a compiler. I’ll use a simplified version as the running example for the rest of this point. I’ll keep the technical details of my particular problem relatively light, so we can focus on X macros.

My compiler generates the intermediate code in a byte buffer. It then emits assembly code from the intermediate code in the buffer. The IR instructions don’t have the same size. They take between one and five bytes. Each instruction has a format that the backend of the compiler uses to generate the assembly code for the instruction. Instructions are identified by an opcode.

One (common) way to represent the instructions is to have an enum for the opcodes, an array for the formats and another one for sizes. Say we have three instructions: mov regd1, regs1, add regd1, regs1, regs2 and ret. This is enough to illustrate the idea. The straightforward implementation might look someting like (in ir.h):

enum
{
    IR_MOV,
    IR_ADD,
    IR_RET,
    IR_OP_CNT
};

char *irfmts[IR_OP_CNT] =
{
    "mov %r, %r",
    "add %r, %r, %r",
    "ret"
};

int irsizes[IR_OP_CNT] =
{
    3,
    4,
    1
};

The problem is that keeping the arrays in sync is error prone. Here’s where X macros come in. By cleverly (ab-)using the C preprocessor we can move the IR instruction declarations in a single file, say ir.inc, and then get the arrays almost for free.

Here’s ir.inc:

XX(IR_MOV, "mov %r, %r", 3)
XX(IR_ADD, "add %r, %r, %r", 4)
XX(IR_RET, "ret", 1)

Then in ir.h:

enum
{
#define XX(OP, F, N) OP,
#include "ir.inc"
#undef XX
    IR_OP_CNT
};

char *irfmts[IR_OP_CNT] =
{
#define XX(OP, F, N) [OP] = F,
#include "ir.inc"
#undef XX
};

int irsizes[IR_OP_CNT] =
{
#define XX(OP, F, N) [OP] = N,
#include "ir.inc"
#undef XX
};

The arrays are now generated automatically from the single definition. This is one of the neater techniques involving the preprocessor in C. What is particularly neat is that we have a single, declarative definition of the instructions.

The D solution

It wasn’t obvious if D supports anything similar to this pattern. I suspected there’s something that could be done with metaprogramming, but I could come up with a working solution.

Then I stumbled on this post in the D forum. Paul Backus’s solution is particularly well suited for our scenario. The definition is now an array of structures with custom attributes:

 instr
{
    string fmt;
    int size;
}

 IrOp
{
    @instr("mov %r, %r", 3) Mov,
    @instr("add %r, %r, %r", 4) Add,
    @instr("ret", 1) Ret,
}

The @instr(...) annotations are attributes attached to the enum variants. To generate our arrays we can now use D’s powerful metaprogramming:

auto instrFmt( sym)() => getUDAs!(sym, instr)[0].fmt;
auto instrSize( sym)() => getUDAs!(sym, instr)[0].size;

auto instrFmts = [staticMap!(instrFmt, EnumMembers!IrOp)];
auto instrSizes = [staticMap!(instrSize, EnumMembers!IrOp)];

Now this is nice! Not only is just as declarative as the C version, but it’s also safer. In C there’s no check if the XX params are used correctly while generating the arrays. In D we access the attribute properties by name.

And that’s it.

Conclusion

I didn’t come up with this technique. But I found it both useful, as well as really neat. The more I use the language, the more nice surprises like this one I stumble on. So far I’m digging D :^).