Formalising the Ethereum Virtual Machine in Dafny

Monday, February 6th, 2023

Since starting at ConsenSys, the main project I have been involved with is a formalisation of the Ethereum Virtual Machine in Dafny called the “DafnyEVM”. Our goals share some similarity with Runtime Verification’s KEVM project. We want a formal semantics of the EVM which is executable so that, for example, we can test it against the Ethereum common tests. Using Dafny, however, offers some advantages over the K framework:

(Verification) With Dafny, we can verify security properties over EVM bytecode directly. For example, we can verify simple properties (e.g. no arithmetic overflow/underflow) as well as more complex properties (e.g. that an invariant between fields of the contract always holds).
(Documentation) Our Dafny formalisation gives a concise and readable description of the Ethereum Virtual Machine. In my opinion, it compares well against the Yellow Paper and the official Python spec (with the caveat that the DafnyEVM remains a work-in-progress).
(Compilation) In principle, we could develop a compiler for a high-level language using the DafnyEVM which guarantees that generated bytecode is correct. A good example of this is the CompCert compiler developed for the C language.

Whilst the DafnyEVM remains a work-in-progress, it can already pass a large number of the Ethereum common tests. So, let’s take a look inside!

Machine State

An executing EVM contains various key components of the executing state such as gas, pc, stack, code, memory, and worldstate. Roughly speaking, we implement this in Dafny like so:

datatype ExecutingEvm = EVM(
  gas: nat, 
  pc: nat, 
  stack: Stack, 
  code: Code,
  mem: Memory, 
  world: WorldState, 
  ...
)

On top of this, we have a notion of the machine state which maybe an executing EVM (as above), or a terminated EVM (e.g. having executed a RETURN or REVERT instruction, or failed with some kind of error):

datatype State = 
     EXECUTING(evm: ExecutingEvm) 
   | REVERTS(gas:nat, data:seq<u8>)
   | RETURNS(gas:nat, data:seq<u8>, ...) 
   | INVALID(Error) 
   | ...

Here, the REVERTS and RETURNS states include their RETURNDATA as a sequence of bytes (i.e. u8) along with any gas returned to the caller. To simplify our code, we can define a new type which captures the notion of an executing EVM as follows:

type ExecutingState = st:State | st.EXECUTING?

Variables of type ExecutingState have type State with the additional constraint that they are instances of State.EXECUTING.

Bytecode Semantics

There are over 140 bytecode instructions supported in the Ethereum Virtual Machine. These include simple arithmetic operations, memory reading/writing, storage reading/writing, contract calls, etc. The semantics (i.e. meaning) of every instruction is formalised in the DafnyEVM.

As a first example, here is our formalisation of the ADD instruction (opcode 0x01):

function Add(st: ExecutingState): (st': State) 
// Execution either continues or halts with stack underflow
ensures st'.EXECUTING? || st' == INVALID(STACK_UNDERFLOW)
// Execution always continues if at least two stack operands
ensures st'.EXECUTING? <==> st.Operands() >= 2
// Execution reduces stack height by one
ensures st'.EXECUTING? ==> st'.Operands() == st.Operands() - 1
{
    if st.Operands() >= 2
    then
        var lhs := st.Peek(0) as int;
        var rhs := st.Peek(1) as int;
        var res := (lhs + rhs) % TWO_256;
        st.Pop().Pop().Push(res as u256).Next()
    else
        INVALID(STACK_UNDERFLOW)
}

This function takes an executing EVM state, and produces an updated state (which may or may not be still executing). The function requires at least two operands on the stack, otherwise execution halts with a stack underflow exception. Furthermore, the lhs (i.e. left-hand side) is at offset 0 from the top of the stack, whilst the rhs is at offset 1. The two operands are then added together modulo TWO_256 to ensure the result fits into a u256. Thus, we can see from this that the EVM does not automatically catch arithmetic overflow for us.

Our semantics above for the ADD instruction illustrates another feature of Dafny. That is, we can add postconditions for our functions to capture salient properties which we want to check. Often these properties correspond to parts of the Yellow Paper which can help make clear the connection between them. Whilst these are not strictly required (technically speaking, because we’re using functions here), they can add a form of useful (and checked) documentation of what we can expect.

As a second example, consider the semantics given for the MLOAD instruction (i.e. opcode 0x51):

function MLoad(st: ExecutingState): (st': State)
// Execution either continues or halts with stack underflow
ensures st'.EXECUTING? || st' == INVALID(STACK_UNDERFLOW)
// Execution always continues if at least one stack operands
ensures st'.EXECUTING? <==> st.Operands() >= 1
// Execution does not affect stack height
ensures st'.EXECUTING? ==> (st'.Operands() == st.Operands())
{
   if st.Operands() >= 1
   then
      var loc := st.Peek(0) as nat;
      // Expand memory as necessary
      var nst := st.Expand(loc,32);
      // Read from expanded state
      nst.Pop().Push(nst.Read(loc)).Next()
   else
      INVALID(STACK_UNDERFLOW)
}

In this case, only one operand on the stack is required (i.e. the address in memory to load from). The purpose of the Expand(loc,32) is to ensure sufficient memory to store a u256 word at address loc. This also illustrates some benefits from using a language like Dafny. The Read function is defined like so:

function method Read(address:nat) : u256
requires (address + 31) < Memory.Size(evm.memory) { 
  ... 
}

Thus, if the Expand(loc,32) call above were omitted or specified incorrectly (e.g. as Expand(loc,31)) then Dafny would flag an error for us at compile time. Of course, Read() could automatically expand memory for us — but, making explicit the places where memory expansion can occur helps ensure expansion is done correctly.

Verification Example

Using the DafnyEVM we can prove properties of interest on bytecode sequences. An example proof is the following:

method AddBytes(x: u8, y: u8) {
  // Initialise an EVM.
  var st := InitEmpty(gas:=1000);
  // Execute three bytecodes
  st := Push1(x);
  st := Push1(y);
  st := Add();
  // Check top of stack is sum of x and y
  assert st.Peek(0) == (x as u256) + (y as u256);
}

This proof roughly states that: for any two byte values x and y on the stack, ADD returns their sum. Observe that, since x and y are bytes, overflow is not possible and Dafny recognises this. If x and y had type u256 then Dafny would flag the potential arithmetic overflow for us at compile time.

As a second (and more interesting) example, let’s consider a more complete contract:

      // Load counter on stack
0x00: PUSH1 0x0
0x02: SLOAD
      // Increment by one
0x03: PUSH 0x1
0x05: ADD 
      // Check for overflow
0x06: DUP1
0x07: PUSH1 0xf
0x09: JUMPI
      // Overflow, so revert
0x0a: PUSH1 0x0
0x0c: PUSH1 0x0
0x0e: REVERT,
      // No overflow
0x0f: JUMPDEST
      // Write back
0x10: PUSH 0x0
0x12: SSTORE
      // Done
0x13: STOP

This contract maintains a counter at storage location 0 which is incremented by one on every contract call. The contract should revert if an overflow occurs. Thus, we should be able to prove that whenever the contract is called: (i) either the counter is incremented; or (ii), the contract reverts. Using the DafnyEVM, the proof of this looks (roughly speaking) as follows:

method IncProof(st: ExecutingState) returns (st': State)
// Initial state has PC == 0 and an empty stack
requires st.PC() == 0 && st.Operands() == 0
// Assume there is enough gas
requires st.Gas() >= 40000
// Success guaranteed if can increment counter
ensures st'.RETURNS? <==> (st.Load(0) as nat) < MAX_U256
// If success, counter incremented
ensures st'.RETURNS? ==> st'.Load(0) == (st.Load(0) + 1) {
  var nst := st;
  // Load counter
  nst := Push1(nst,0x0);
  nst := SLoad(nst);
  // Increment it by one
  nst := Push1(nst,0x1);
  nst := Add(nst);
  // Check for overflow
  nst := Dup(nst,1);
  nst := Push1(nst,0xf);
  nst := JumpI(nst);
  // Case analysis on outcome
  if nst.Peek(0) == 0 {
    // Overflow occurred
    assert nst.PC() == 0xa;
    nst := Push1(nst,0x0);
    nst := Push1(nst,0x0);
    nst := Revert(nst);
    assert nst.REVERTS?;
  } else {
    // No overflow!
    assert nst.PC() == 0xf;
    nst := JumpDest(nst);
    // Write back
    nst := Push1(nst,0x0);
    nst := SStore(nst);
    nst := Stop(nst);
    assert nst.RETURNS?;
  }
  return nst;
}

The proof is quite long, but it essentially follows the control-flow of the original contract. The nice thing is we can reuse Dafny language constructs (e.g. if and assert) to help manage the proof. More importantly, Dafny checks at compile time that the various assert statements and ensures clauses are always true (i.e. that they hold for all possible inputs which meet the requires conditions).

Conclusion

The DafnyEVM represents an interesting formalisation of the Ethereum Virtual Machine. This allows us to show strong correctness properties for arbitrary bytecode sequences. Whilst its not yet finished, it is certainly showing promise. However, a key challenge lies in scaling up the proofs to handle larger real-world contracts (and that is something I will talk more about in the future).