Dr. David J. Pearce

Normalising Recursive Data Types


Recently, I came across an interesting problem with the type system I’m using in Whiley.  Specifically, the problem relates to recursive types which are equivalent but not identical.  Consider, for example, the following Whiley code:

define Link as { int data, LinkedList next }
define LinkedList as null | Link

This is a fairly straightforward definition of a linked list.

Now, the question is: what is the type of a LinkedList? To determine this, the compiler does a depth-first search from the define point.  So, starting from  LinkedList, it traverses into the body of the defintion, and then into the bodies of those datatypes it contains (i.e. Link in this case).  At some point during this traversal, it will encounter the name LinkedList again, which signals a recursive data type.  This traversal leads to the following (recursive) type: X < null | {int data, X next} > Here, the braces signify a recursive type on variable X.  Such a type essentially captures an infinite number of concrete types, which can be obtained by unfolding.  Some examples are

null
{int data, null next}
{int data, {int data, null next} next}

All of these are subtypes of the original recursive type.

Now, onto the problem at hand, which is nicely illustrated by the following Whiley code:

int length(LinkedList list):
  if list ~= Link:
    return 1 + length(list.next)
  else:
    return 0

This little example shows how to determine the length of a list.  In particular, we use the ~= type test operation to check the runtime type of list.  What this doing is essentially checking whether Link is a subtype of LinkedList.  The actual types in question are:

LinkedList ==> X < null | {int data, X next} >
Link ========> Y < {int data, null|Y next} >

So, how do we test whether one is the subtype of the other? Well, this is the thing: we need them to be the same, but they’re obviously not.

To resolve this issue, we can normalise the recursive types.  There are different ways this could be done, but for the moment I do it by unfactoring them.  That is, by pulling out bits which are not recursive, like so:

X < null | {int data, X next} >
=====> null | X < {int data, null|X next}>

The key is that the second component of the derived type is now identical to the type of Link and we can now easily perform the subtype test.

Anyway, it’s not completely clear to me whether this unfactoring process is sufficient for all possible cases or, indeed, what the complexity class of subtype testing in the presence of recursive types like these is (hopefully, it’s at least decidable :).  Digging around briefly, I did come across the following paper which seems related, but I need time to digest it properly: