Playing Fast and Loose With GADTs
We’re going to be playing with Generalized Algebraic Data Types (terrible article) in Clojure. To preface, I am not an authority on GADTs, or much of anything — I mostly learn about things by implementing and explaining them. The code is here.
We’ll begin with a brief exposition of what these things are, and then go on to implement a tiny, domain-specific language in which we can express them, pattern-match them, and define functions over them. We will then implement a trivial, type-safe expression interpreter in this DSL.
Note that we’re not interested in a concrete implementation of GADTs, just exploring them at the type level — we’re writing a type-inferring checker, not adding a feature to a programming language.
Algebraic?
What’s algebraic about a data type? If we think about particular ways of combining types as mirroring algebraic operations, it’ll help us reason about fancier typing constructs (and compilers love it). Let’s look at a “product type”:
data Point2D = Point2D Float Float
This type can represent every possible float paired with every
possible float, or Float * Float
Which is a sick number of
floats1. We’ve also got sum types:
data Point = Point2D Float Float | Point3D Float Float Float
Which correspond to addition, or (Float * Float) + (Float * Float *
Float)
. I’m taking for granted that everyone has internalized this
syntax, but the above declares, in Haskell-land, a type Point
with
two constructors, Point2D
and Point3D
. Our ADTs can take type
variables, e.g. Maybe
, which is called Option
almost
everywhere else:
data Maybe a = Nothing | Just a
Where a
is a type variable. These are just ADTs - what’s a
generalized ADT?
Generalized ADTs
In the previous example, Point2D
and Point3D
are both Point
,
which is fine, but there are cases where we’d like different type
constructors to produce different instances of the type. The classic
example is the typed interpreter:
data Expr a where
LitInt :: Int -> Expr Int
LitBool :: Bool -> Expr Bool
Add :: Expr Int -> Expr Int -> Expr Int
If :: Expr Bool -> Expr a -> Expr a -> Expr a
These constructors are just functions - Add
takes two Expr Int
and
returns an Expr Int
. The power of this type becomes apparent when
we look at the implementation of an evaluation function:
eval :: Expr a -> a
eval (LitInt n) = n
eval (LitBool b) = b
eval (Add x y) = eval x + eval y
eval (If cond t e) = if eval cond then eval t else eval e
When we pattern match on an Expr
constructor (e.g. LitInt n
), we
refine the type variable (eval
is typed as Expr a -> a
), and
thus know at compile time that n
is an Int
, and can treat it like
one on the right-hand side of the =
. We can’t do this with plain
ADTs. There are other cute applications, like type-level data structure
invariants.
The specifics will be covered in more detail below.
Implementation
I’m going to repeatedly refer to unification, which will be familiar to anyone who has experimented with logic programming, or type inference. In our context, it means finding a variable substution which will make two types equivalent, if such a thing is possible.
Our implementation is purely functional, because that’s the simplest approach here. It isn’t necessarily the simplest approach for for all unification problems.
Note that the implementation is optimized for brevity and simplicity; there are many edge cases it doesn’t handle. We’re doing the bare minimum to support our typed interpreter example.
Beyond the initial explanations, we will also use an idealized syntax
when discussing terms in the program, like {A (List A)}
instead of
spelling out the record types.
Let’s get the boring stuff out of the way. I apologize in advance that I write and format my code in a particularly twee style.
(ns gadt.core
(:require [clojure.string :as str]
[clojure.walk :as walk]))
(defrecord TypeVar [name])
(defrecord TypeCtor [name tag params ret-type])
Type constructors hold the name of the type (e.g. Expr
), the tag, or
constructor name (IntLit
) and a sequence of the type parameters and
the return type.
(defrecord Type [name params])
Above is how we represent occurrences of the (parameterized) type
itself within its constructors, e.g. all instances of Expr
in:
...
Add :: Expr Int -> Expr Int -> Expr Int
While these are strictly type constructors, we treat them differently because:
- We can’t construct types with them outside of the definition of
Expr
. - They require unification, as they appear in the parameter lists and return types of exposed type constructors.
(defn type-var? [t] (instance? TypeVar t))
(defn type-ctor? [t] (instance? TypeCtor t))
(defn type? [t] (instance? Type t))
(defn atomic-type? [t] (keyword? t))
(defn make-type-var [name]
(->TypeVar name))
(defn make-type-ctor [name tag params ret-type]
(->TypeCtor name tag params ret-type))
(defn make-type [name params]
(->Type name params))
(defn env-lookup [env var]
(or (env var)
(throw (ex-info (str "Unbound identifier: " var) {:var var}))))
env
is just an immutable map. In practice, it’s almost always
*env*
, a dynamic binding. Questionable choice, maybe, but we’re
going to have functions with pattern matching expressions inside of
them, and if they all had to be passed an environment map explicitly,
it would get messy. Let’s jump right into how we declare our data
types:
Declarations
(declare-data-type List [A]
(Nil [] (List A))
(Cons [A (List A)] (List A)))
We’re using upper-case letters for type-variables by convention.
What’s happening here is we’re introducing a GADT called List
, with
a single universally quantified (i.e. “forall A
”) type variable,
which the Nil
constructor does not constrain — it returns the
polymorphic (List A)
. The Cons
constructor takes a value of some
type, and a List
of that same type, returning a List
of that one
type.
The declare-data-type
macro is not
important, but after it runs, we will end up with the following
*env*
:
{Nil
#TypeCtor{:name List
:tag Nil
:params []
:ret-type #Type{:name List :args [#TypeVar{:name A}]}}
Cons
#TypeCtor{:name List
:tag Cons
:params [#TypeVar{:name A}
#Type{:name List :args [#TypeVar{:name A}]}]
:ret-type #Type{:name List :args [#TypeVar{:name A}]}}}
Whew. It’s important to note that it’s not called
define-data-type
— the form binds no names in the current
namespace, only in the currently bound type environment.
Typechecking
Let’s look at a redacted version of typecheck
, and imagine the case
where we’re typechecking the constructor (Nil)
in the environment we
defined above.
(defn typecheck
([expr env]
(cond
(atomic-type? expr) expr
(number? expr) :number
(string? expr) :string
(boolean? expr) :bool
(symbol? expr) (env-lookup env expr)
....
(and (list? expr) (symbol? (first expr)))
(let [[ctor & args] expr
ctor-type (env-lookup env ctor)]
(cond
(type-ctor? ctor-type)
(let [param-types (:params ctor-type)]
(assert (= (count args) (count param-types)))
(let [ret-type (:ret-type ctor-type)
arg-types (map #(typecheck % env) args)
subst (unify-pairwise param-types arg-types)]
(instantiate ret-type subst)))
...
:else (throw (ex-info "Not a constructor or function"
{:expr expr}))))))
([expr] (typecheck expr *env*)))
We can see that not much would happen. unify-pairwise
has nothing
to do, because there are no parameters or arguments, and instantiate
(which substitutes type variables for types, given a substitution map)
will return the input type unmolested if given an empty one. Here,
we’re instantiating the return type of (Nil)
, which is the
polymorphic (List A)
. So, the result of typechecking (Nil)
is
(List A)
.
instantiate
is pretty simple, subst
is a map of type variables to
types, and we try to look up each type variable in it,
recurring structurally anywhere a type variable might be hiding.
(defn instantiate [type subst]
(cond
(type-var? type) (subst type type)
(type-ctor? type) (make-type-ctor
(:name type)
(:tag type)
(map #(instantiate % subst) (:params type))
(instantiate (:ret-type type) subst))
(type? type) (make-type
(:name type)
(map #(instantiate % subst) (:params type)))
:else type)))
Our unification functions:
(declare unify)
(defn unify-pairwise [t1s t2s]
(reduce
(fn [subst [arg1 arg2]]
(let [arg1' (instantiate arg1 subst)
arg2' (instantiate arg2 subst)
subst' (unify arg1' arg2')]
(merge subst subst')))
{}
(map vector t1s t2s)))
(defn unify [t1 t2]
(cond
(= t1 t2) {}
(type-var? t1)
(if-not (contains? (type-vars t2) t1)
{t1 t2}
{})
(type-var? t2)
(if-not (contains? (type-vars t1) t2)
{t2 t1}
{})
(and (type? t1)
(type? t2)
(= (:name t1)
(:name t2))) (unify-pairwise (:params t1) (:params t2))
:else
(throw (ex-info "Cannot unify" {:types [t1 t2]}))))
(defn try-unify [t1 t2]
(try
(unify t1 t2)
(catch Exception _
nil)))
type-vars
(elided), is similar structurally to instantiate
insofar is it visits the interesting parts of each type.
What about a case like (typecheck '(Cons 2 (Nil)))
? In typecheck
, the parameter types for Cons
are [A (List A)]
, and our supplied arguments are of type [:number (List A)]
, obtained by recursive calls to typecheck
. 2 is trivially a :number
, and recall from our previous example that the return type of (Nil)
is the polymorphic (List A)
. We then call unify-pairwise
on [A (List A)] [:number (List A)]
which:
- Instantiates
A
and:number
(no-op) - Unifies
A
and:number
, yielding the substition map{A :number}
- Instantiates
(List A)
and(List A)
with that substition map (both instantiate to(List :number)
) - Both trivially unify via
=
, adding no additional substitutions - The return type of
Cons
, which is(List A)
, is then instantiated in{A :number}
, giving us the type(List :number)
for the expression
Now, we want to pattern match our types. This involves some trade-offs. I didn’t want the implementation to grow in complexity unnecessarily, so I didn’t handle nested constructors, meaning that this may very well match:
(match (Cons 1 (Nil))
[(Cons x (Cons y (Nil))) x])
I don’t see this as an immediate problem, because this is a fledgling typechecker, which is not often going to have access to concrete terms.
Pattern Matching
We need to add another clause to typecheck
:
(defn typecheck [expr env]
(cond
...
(and (list? expr) (= (first expr) 'match))
(let [[_ inspect & cases] expr
ty (typecheck inspect env)]
(or (process-match-cases env ty cases nil)
(throw (ex-info "No matching cases in match expression" {:expr expr}))))))
Let’s take match pattern like this, the body of a hypothetical first
function:
(declare-data-type Maybe [A]
(Nothing [] (Maybe A))
(Just [A] (Maybe A)))
(match (Cons 1 (Nil))
[(Cons x _) (Just x)]
[(Nil) (Nothing)])
Which typechecks to (Maybe :number)
. We first typecheck (Cons 1
(Nil))
, getting (List :number)
and bind the patterns to cases
(in the above code listing). Let’s look at process-match-cases
:
(defn process-match-cases [env ty cases result]
(if (empty? cases)
result
(let [[[pat body] & rest-cases] cases]
(if-let [refined-env (pattern-match env pat ty)]
(let [body-type (typecheck body refined-env)
result (if result
(if (type-var? result)
result
(if (type-var? body-type)
body-type
(if-let [subst (try-unify result body-type)]
(instantiate result subst)
ty)))
body-type)]
(process-match-cases env ty rest-cases result))
(process-match-cases env ty rest-cases result)))))
We process all of the cases, because we’re a type checker.
pattern-match
returns nil
if it can’t match the given pattern
against the supplied type, otherwise it returns the environment
extended with the bindings that’ll exist on the right hand side of the
match (the body). The handling of result
is to try and maintain the
most general type possible across all matching patterns. Here’s pattern-match
:
(defn pattern-match [env pattern ty]
(cond
(= pattern '_) env
(symbol? pattern) (assoc env pattern ty)
(and (list? pattern) (symbol? (first pattern)))
(let [[ctor-name & sub-pats] pattern
ctor-type (try
(env-lookup env ctor-name)
(catch Exception _
nil))]
(when (and ctor-type (type-ctor? ctor-type))
(let [param-types (:params ctor-type)
ret-type (:ret-type ctor-type)]
(when (= (count sub-pats) (count param-types))
(when-let [subst (try-unify ret-type ty)]
(let [refined-params (map #(instantiate % subst) param-types)]
(loop [env env
[p & patterns] sub-pats
[t & types] refined-params]
(if-not t
env
(when-let [env (pattern-match env p t)]
(recur env patterns types))))))))))
(or (number? pattern)
(boolean? pattern)
(string? pattern))
(if (try-unify (typecheck pattern env) ty)
env
nil)
:else nil))
In the case where we’re matching the pattern (Cons x _)
we’ll hit
the third clause in the cond
. Recall that ty
is the result of
typechecking the subject of the match pattern — (Cons 1 (Nil))
.
refined-params
will be [:number (List :number)]
, so the recursive
calls in loop
(x => (assoc env x :number), _ => env
) refine the
environment. The result is (Maybe :number)
when the body, (Just
x)
, is typechecked in the refined env by the result processing logic
in process-match-cases
.
Functions
(defrecord FuncType [params ret-type])
(defn func-type? [t] (instance? FuncType t))
(defn make-func-type [param-types ret-type]
(->FuncType param-types ret-type))
(def ^:dynamic *env*
{'+ (make-func-type [:number :number] :number)
'- (make-func-type [:number :number] :number)
'* (make-func-type [:number :number] :number)
'not (make-func-type [:bool] :bool)
'= (make-func-type [:number :number] :bool)})
(defn instantiate [type subst]
(cond
(func-type? type) (make-func-type
(map #(instantiate % subst) (:params type))
...)))
(defn unify [t1 t2]
(cond
(and (func-type? t1)
(func-type? t2))
(let [param-subst (unify-pairwise (:params t1) (:params t2))
ret-subst (unify (instantiate (:ret-type t1) param-subst)
(instantiate (:ret-type t2) param-subst))]
(merge param-subst ret-subst))
...))
(defn typecheck
([expr env]
(cond
...
(and (list? expr) (= (first expr) 'if))
(let [[_ cond then else] expr
cond-type (typecheck cond env)
then-type (typecheck then env)
else-type (typecheck else env)]
(unify cond-type :bool)
(unify then-type else-type)
(let [result-subst (unify then-type else-type)]
(instantiate then-type result-subst)))
(and (list? expr) (symbol? (first expr)))
(let [[ctor & args] expr
ctor-type (env-lookup env ctor)]
(cond
...
(func-type? ctor-type)
(let [param-types (:params ctor-type)]
(assert (= (count args) (count param-types)))
(let [ret-type (:ret-type ctor-type)
arg-types (map #(typecheck % env) args)]
(doseq [[param-type arg-type] (map vector param-types arg-types)]
(unify param-type arg-type))
ret-type))))))
([expr] (typecheck expr *env*)))
Nothing too surprising here. Again, we’re going to skip over the macro we use to declare our functions, and just declare some:
(declare-fn [A] identity [x A] A
x)
(type->str (*env* 'identity)) ;; => 'A -> 'A
This is the polymorphic identity function, parameterized over the type
variable A
. Here’s a simple factorial:
(declare-fn factorial [n :number] :number
(if (= n 0)
1
(* n (factorial (- n 1)))))
Note that as before, we’re not binding anything at the top-level, and we can’t execute these functions. They are typechecked (the inferred body type is unified with the annotated return type) at declaration time. And, finally:
(declare-data-type Expr [A]
(LitNum [:number] (Expr :number))
(LitBool [:bool] (Expr :bool))
(Not [(Expr :bool)] (Expr :bool))
(Add [(Expr :number)
(Expr :number)] (Expr :number))
(Eq? [(Expr :number)
(Expr :number)] (Expr :bool))
(If [(Expr :bool)
(Expr A)
(Expr A)] (Expr A)))
(declare-fn [A] evaluate [x (Expr A)] A
(match x
[(LitNum n) n]
[(LitBool b) b]
[(Not e) (not (evaluate e))]
[(Add e1 e2) (+ (evaluate e1) (evaluate e2))]
[(Eq? e1 e2) (= (evaluate e1) (evaluate e2))]
[(If cond then-expr else-expr)
(if (evaluate cond)
(evaluate then-expr)
(evaluate else-expr))]))
;; Expr 'A -> 'A
Happy hacking! Hope there weren’t too many howlers. Full implementation source.
-
We might be tempted to think about this in terms of sets, where
Point2D
is the cartesian product of the set ofFloat
and the set ofFloat
. Set-theoretic type systems are a thing, but we’re sticking with algebraic types here. ↩