Pattern Matching in Rust's Neverland
Recently, I’ve been working on a tool for manipulating EVM bytecode and assembly language. I wanted to describe low level bytecode and assembly language using the same datatype. At the same time, I wanted some instructions to be allowed only in assembly language but not bytecode (e.g. labels only make sense at the assembly language level). I was hoping to encode this precisely in stable Rust, but couldn’t make it work. However, turns out you can do this on nightly (where RFC1872 is merged). That is actually pretty neat, and I want to explain why.
Background
A very simplified version of my bytecode system is:
enum Insn<T> {
PUSH(usize),
LABEL(T),
JUMP(T),
RET
}
type Bytecode = Insn<usize>;
type Assembly = Insn<String>;
This is a subset of the instructions I needed, but its enough to
illustrate what I was trying to achieve. As indicated, we instantiate
Insn<T>
in one of two ways:
Bytecode
. This describes concrete bytecode instructions where exact jump destinations (expressed as byte offsets) are known.Assembly
. This describes assembly language instructions which use labels to describe control-flow.
Thus, we can have a simple function which “assembles” assembly language instructions into concrete instructions:
fn assemble(&[Assembly]) -> Vec<Bytecode>
(Note: To make this work properly, we’d want some error handling but I’m ignoring that here). Similarly, we can have a simple function which “disassembles” concrete instructions back into assembly language instructions:
fn disassemble(&[Bytecode]) -> Vec<Assembly>
The interesting thing about these two functions is that some instructions don’t make sense, depending on the context. For example, labels are not bytecode instructions as, in effect, they are artificial instructions used only for assembly language. So, we might write something like this:
fn disassemble_insn(insn: Bytecode) -> Assembly {
match insn {
Bytecode::PUSH(imm) => {
Assembly::PUSH(imm)
}
Bytecode::JUMP(offset) => { todo!() }
Bytecode::RET => {
Assembly::RET
}
Bytecode::LABEL(_) => { unreachable!(); }
}
}
This is fine as it goes, but it’s a shame we have to use
unreachable!()
to signal the LABEL
case should not occur. So, I
was wondering: can we use Rust’s type system to encode these
constraints? More specifically, so that Insn::Label(usize)
: (1) cannot
be instantiated; and (2) cannot be matched against. The short answer is:
yes!
Stable Version
My first approach was to exploit types which cannot be instantiated (so-called empty types). It looks something like this:
enum Insn<T,L> {
PUSH(usize),
LABEL(L),
JUMP(T),
RET
}
enum Void {}
type Bytecode = Insn<usize,Void>;
type Assembly = Insn<String,String>;
This prevents instances of Bytecode::LABEL
from being created, since
we cannot create an instance of type Void
. That’s great!
Unfortunately, I was disappointed the following still didn’t compile:
fn disassemble_insn(insn: Bytecode) -> Assembly {
match insn {
Bytecode::PUSH(imm) => {
Assembly::PUSH(imm)
}
Bytecode::JUMP(offset) => { todo!() }
Bytecode::RET => {
Assembly::RET
}
}
}
Here, I’ve dropped the case for Bytecode::LABEL
since we know this
cannot arise any more. Unfortunately, the Rust compiler still
requires you to cover the case for Bytecode::LABEL
. So, I
tweeted
out my disappointment.
Nightly Version
Thanks to Varun who
responded
by pointing me to
RFC1872. This is
exactly what we needed to fix the above, though it is not yet in
stable. In essence instead our own Void
type, we use Rust’s new
so-called never type
(!
) which the compiler knows is uninhabited. Here’s the final
version:
#![feature(never_type)]
#![feature(exhaustive_patterns)]
enum Insn<T,L> {
PUSH(usize),
LABEL(L),
JUMP(T),
RET
}
type Bytecode = Insn<usize,!>;
type Assembly = Insn<String,String>;
Here, instead of using a custom Void
type in defining Bytecode
, we
can now use !
directly as a type. With this definition our version
of disassemble_insn()
above which omits the case for
Bytecode::LABEL
now compiles. Furthermore, Rust emits a warning
(unreachable pattern
) if a case for Bytecode::LABEL
is included!
Conclusion
The never type !
looks to be a very handy extension to Rust, and I’m
looking forward to it landing in the stable branch. When combined
with the exhaustive pattern matching extension in
RFC1872, it really
is very powerful.