Efficient Functions in Dafny
In developing an EVM in
Dafny, a key requirement is
to test it against the official EVM test
suite. This means it needs to be
at least reasonably efficient, otherwise completing a test run becomes
impractical. At the same time, our DafnyEVM is intended to serve as a
specification for which can be used to verify bytecode contracts (see
e.g. here). This
means we use Dafny function
s to implement our EVM, rather than
method
s. For various reasons, a function
in Dafny is fairly
restricted and, for example, must use recursion instead of
imperative-style loops. Dafny will exploit tail
recursion to improve
performance, but this is not always possible. Thus, we have a
problem: we need to use function
s for verification, but these are
(sometimes) less efficient than method
s.
Example
To illustrate, consider the following method taken from our DafnyEVM:
function ToBytes(v:nat) : (r:seq<u8>)
ensures |r| > 0 {
// Extract the byte
var b : u8 := (v % 256) as u8;
// Determine what's left
var w : nat := v/256;
if w == 0 then [b]
else
ToBytes(w) + [b]
}
This converts an arbitrary-sized unsigned integer into a sequence of one or more bytes (in big endian form). When translating this into Java, Dafny regards it as tail recursive and generates (very roughly speaking) the following code:
List<Byte> ToBytes(v:BigInteger) {
ArrayList<Byte> r = new ArrayList<>();
while (true) {
byte b = v.mod(_256).byteValue();
BigInteger w = v.div(_256);
if(w.signum() == 0) {
return r;
}
r.add(b);
} }
Dafny optimises the recursive call into a while
loop — thereby
making it (relatively) efficient. Consider now the analoguous
operation which converts a sequence of bytes back into an
arbitrary-sized unsigned integer:
function FromBytes(bytes:seq<u8>) : nat {
if |bytes| == 0 then 0
else
var last := |bytes| - 1;
var byte := bytes[last] as nat;
var msw := FromBytes(bytes[..last]);
(msw * 256) + byte
}
Dafny does not recognise this as tail recursive and, hence, generates (very roughly speaking) the following Java:
BigInteger FromBytes(bytes: List<Byte>) {
if bytes.length() == 0 {
return BigInteger.ZERO;
} else {
int last = bytes.length() - 1;
byte byte = bytes.get(last);
bytes = bytes.subList(0,last);
BigInteger msw = FromBytes(bytes);
msw = msw.mul(_256);
msw = msw.add(BigInteger.valueOf(byte & 0xff));
return msw;
} }
Unfortunately, on the official Ethereum test suite this implementation
raises a StackOverflowException
on some tests. Specifically, on
tests for certain precompiled
contracts which generate very
large integers from long byte sequences.
Function by method
Dafny supports a little-known feature (by method
) which allows a
function
to be implemented using a method
. This means the
specification can still be given in the functional (i.e. recursive)
style whilst the implementation is given in an imperative style
(i.e. with loops).
To better understand this, consider the following example:
function sum(items: seq<nat>) : (r:nat) {
if |items| == 0 then 0
else items[0] + sum(items[1..])
}
This recursively computes the sum of a sequence of unsigned integers
(nat
s). Using by method
, we can add an imperative implementation
like so:
function sum(items: seq<nat>) : (r:nat) {
if |items| == 0 then 0
else items[0] + sum(items[1..])
} by method {
r := 0;
var i : nat := |items|;
while i > 0
invariant r == sum(items[i..]) {
i := i - 1;
r := r + items[i];
} }
This uses a while
loop to implement the functional specification.
The key point is that Dafny automatically proves the imperative
version correctly implements the functional specification. Whilst
that is pretty amazing, Dafny does need help to do this: firstly, an
invariant
was needed to help Dafny match the loop and the recursive
specification at each step; secondly, to make this work, the loop must
traverse backwards through the sequence (which is perhaps not that
intuitive). However, having done this, we now know sum()
will not
exhaust the call stack at runtime on large inputs!
We can now see the final version of our FromBytes()
function from before:
function FromBytes(bytes:seq<u8>) : (r:nat) {
if |bytes| == 0 then 0
else
var last := |bytes| - 1;
var byte := bytes[last] as nat;
var msw := FromBytes(bytes[..last]);
(msw * 256) + byte
} by method {
r := 0;
for i := 0 to |bytes|
invariant r == FromBytes(bytes[..i]) {
var ith := bytes[i] as nat;
r := (r * 256) + ith;
LemmaFromBytes(bytes,i);
}
// Dafny needs help here :)
assert bytes[..|bytes|] == bytes;
// Done
return r;
}
The by method
implementation is similar to what we did for the
sum()
example, though Dafny requires slightly more hand-holding to
show the while
loop is equivalent to the recursive specification.
In particular, a lemma LemmaFromBytes()
is needed to help
reestablish the loop invariant (not shown). Regardless, the net
effect is the same — FromBytes()
no longer exhausts the call stack
on the official EVM test suite. That is a great result!
Conclusion
In languages like Dafny, a dichotomy exists between functional specifications and imperative implementations: Specifications must be functional so the underlying theorem prover can safely reason about them; on the other hand, algorithms are often more efficient when implemented using loops. The ability to seamlessly cross the divide between these two extremes is a unique feature of Dafny which (when the need arises) is compelling.