February 25, 2023
Recently, while migrating a project from C to D–I’ll write about it at some point–I found myself missing X macros. As it turns out, D has a pretty neat solution for this problem. And yes, it involves metaprogramming :^).
If you’re not familiar with X macros you can read more about them in the article I linked, but here’s a quick introduction.
X macros solve the problem of keeping in sync related entities in a program. In my case it’s instructions for an intermediate representation (IR) in a compiler. I’ll use a simplified version as the running example for the rest of this point. I’ll keep the technical details of my particular problem relatively light, so we can focus on X macros.
My compiler generates the intermediate code in a byte buffer. It then emits assembly code from the intermediate code in the buffer. The IR instructions don’t have the same size. They take between one and five bytes. Each instruction has a format that the backend of the compiler uses to generate the assembly code for the instruction. Instructions are identified by an opcode.
One (common) way to represent the instructions is to have an
enum
for the opcodes, an array for the formats and another
one for sizes. Say we have three instructions:
mov regd1, regs1
, add regd1, regs1, regs2
and
ret
. This is enough to illustrate the idea. The
straightforward implementation might look someting like (in
ir.h
):
enum
{
,
IR_MOV,
IR_ADD,
IR_RET
IR_OP_CNT};
char *irfmts[IR_OP_CNT] =
{
"mov %r, %r",
"add %r, %r, %r",
"ret"
};
int irsizes[IR_OP_CNT] =
{
3,
4,
1
};
The problem is that keeping the arrays in sync is error prone. Here’s
where X macros come in. By cleverly (ab-)using the C preprocessor we can
move the IR instruction declarations in a single file, say
ir.inc
, and then get the arrays almost for free.
Here’s ir.inc
:
(IR_MOV, "mov %r, %r", 3)
XX(IR_ADD, "add %r, %r, %r", 4)
XX(IR_RET, "ret", 1) XX
Then in ir.h
:
enum
{
#define XX(OP, F, N) OP,
#include "ir.inc"
#undef XX
IR_OP_CNT};
char *irfmts[IR_OP_CNT] =
{
#define XX(OP, F, N) [OP] = F,
#include "ir.inc"
#undef XX
};
int irsizes[IR_OP_CNT] =
{
#define XX(OP, F, N) [OP] = N,
#include "ir.inc"
#undef XX
};
The arrays are now generated automatically from the single definition. This is one of the neater techniques involving the preprocessor in C. What is particularly neat is that we have a single, declarative definition of the instructions.
It wasn’t obvious if D supports anything similar to this pattern. I suspected there’s something that could be done with metaprogramming, but I could come up with a working solution.
Then I stumbled on this post in the D forum. Paul Backus’s solution is particularly well suited for our scenario. The definition is now an array of structures with custom attributes:
struct instr
{
string fmt;
int size;
}
enum IrOp
{
("mov %r, %r", 3) Mov,
@instr("add %r, %r, %r", 4) Add,
@instr("ret", 1) Ret,
@instr}
The @instr(...)
annotations are attributes attached to
the enum variants. To generate our arrays we can now use D’s powerful
metaprogramming:
auto instrFmt(alias sym)() => getUDAs!(sym, instr)[0].fmt;
auto instrSize(alias sym)() => getUDAs!(sym, instr)[0].size;
auto instrFmts = [staticMap!(instrFmt, EnumMembers!IrOp)];
auto instrSizes = [staticMap!(instrSize, EnumMembers!IrOp)];
Now this is nice! Not only is just as declarative as the C version,
but it’s also safer. In C there’s no check if the XX
params
are used correctly while generating the arrays. In D we access the
attribute properties by name.
And that’s it.
I didn’t come up with this technique. But I found it both useful, as well as really neat. The more I use the language, the more nice surprises like this one I stumble on. So far I’m digging D :^).