Simplification vs Minimisation of Types in Whiley

Tuesday, August 30th, 2011

Recently, I’ve been trying to harden up the implementation of Whiley’s type system. The reason for this is fairly straightforward: bugs in the code often prevent me from compiling correct programs!

In thinking about how to restructure the algorithms I’m using, I realised its important to distinguish simplification from minimisation. I’ve talked about minimisation in some detail before (see here and here).

In fact, there are three phases in computing the ideal representation of a given type (in order of increasing difficulty):

Simplification. Here, we apply mostly straightforward simplifications. Examples include: (T|T) => T, (T|any) => any, (T|void) => T, (T|(S|U)) => (T|S|U), etc.
**Minimisation. **Here, we ensure that no two nodes in the type graph are equivalent under the subtyping operator. For example, the type X<[null|[null|X]]> is not minimised, whilst X<[null|X]> is its minimised form.
Canonicalisation. The final step is to ensure equivalent types have an identical representation on the machine. This is related to the Graph isomorphism problem and, more specifically, the issues of computing a canonical form of a graph.

This all seems fairly straightforward … but there are of course some tricky bits!!

Simplification is not that Simple!

The first mistake I made was to assume that simplification was a simple step in the process. Unfortunately, there are some gotchas:

define LinkedList as null | { LinkedList next, int data }
define MyList as int | LinkedList

The problem is how to minimise this. One property I want of the simplified form is to eliminate unions of unions. But, consider the type graph for MyList:

(here, circles represent unions, squares represent records, etc)

Now, it’s difficult to see how we simplify the above to remove the union (node 0) of a union (node 2). That’s because of the recursive link directly into node 2. In fact, we can achieve this by judiciously expanding the type like so:

This does the trick and (I believe) it’s always possible to eliminate unions of unions in this way.

Simplification Helps Minimisation!

You might be wondering: *why bother with simplification at all? * Well, it’s because it simplifies the minimisation algorithm (which is one of the components that features a lot of bugs). In essence, simplification gives me the following property:

Property 1. Any two equivalent nodes in a simplified type have identical reachable structure.

In a non-simplified type graph, the above is clearly not true. For example, in the type (any|int) the outer union and any are equivalent (but have different structure). Here’s another example:

Here, nodes 0, 1 and 2 are all equivalent. But, whilst nodes 1 and 2 have identical reachable structure, this differs from node 0. In particular, the children of node 0 are in the same equivalence class as node 0, whilst those for nodes 1 and 2 are not. In practical terms, my minimisation algorithm would have to handle this edge case and its numerous variations. With simplified form, however, these awkward cases disappear.

Minimised is Simplified?

An interesting question is whether a type remains in simplified form after minimisation. I conjecture that the answer is yes! Atleast, provided a little care is taken.

The main issue is that the minimisation process can remove union nodes; however, it cannot introduce them. Consider this final example:

define LinkedList as null | { LinkedList next, int data }
define InnerList as null | { OuterList next, int data }
define OuterList as null | { InnerList next, int data }
define MyList as LinkedList | OuterList

These definitions give rise to the following type graph for MyList:

This type graph is fully simplified (it’s a bit of a monster though!). This is because simplification does not attempt to eliminate equivalent nodes from a union, only identical ones. After minimisation, the outermost union will be removed and we’ll be left with just this:

Now, back to the original issue. In the general case, we can remove a union node from between any two nodes. However, we cannot remove other kinds of nodes unless the entire subtree below that node is removed. Therefore, we cannot introduce a union of union through this process.