Idris’ Internals

Note: this is still a fairly raw set of notes taken by David Christiansen at Edwin’s presentation at the 2013 Idris Developers Meeting. They’re in the process of turning into a useful guide - feel free to contribute.

This document assumes that you are already familiar with Idris. It is intended for those who want to work on the internals.

People looking to develop new back ends may want to look at [[Idris back end IRs|Idris-back-end-IRs]]

Core/TT.hs

Idris is compiled to a simple, explicit core language. This core language is called TT because it looks a bit like a Π. It’s a minimal language, with a locally nameless representation. That is, local variables are represented with de Bruijn indices and globally-defined constants are represented with names.

The TT datatype uses a trick that is common in the Idris code: it is polymorphic over the type of names stored in it, and it derives Functor. This allows fmap to be used as a general-purpose traversal.

There is a general construction for binders, used for λ, Π, and let-bindings. These are distinguished using a BinderType.

During compilation, some terms (especially types) will be erased. This is represented using the Erased constructor of TT. A handy trick when generating TT terms is to insert Erased where a term is uniquely determined, as the typechecker will fill it out.

The constructor Proj is a result of the optimizer. It is used to extract a specific constructor argument, in a more economical way than defining a new pattern-matching operation.

The datatype Raw represents terms that have not yet been typechecked. The typechecker converts a Raw to a TT if it can.

Core/CaseTree.hs

Case trees are used to represent top-level pattern-matching definitions in the TT language.

Just as with the TT datatype, the deriving Functor trick is used with SC and CaseAlt to get GHC to generate a function for mapping over contained terms.

Constructor cases (ConCase in CaseAlt) refer to numbered constructors. Every constructor is numbered 0,1,2,…. At this stage in the compiler, the tags are datatype-local. After defunctionalization, however, they are made globally unique.

The n+1 patterns (SucCase) and hacky-seeming things are to make code fast – please ask before “cleaning up” the representation.

Core/Evaluate.hs

This module contains the main evaluator for Idris. The evaluator is used both at the REPL and during type checking, where normalised terms need to be compared for equality.

A key datatype in the evaluator is a context. Contexts are mappings from global names to their values, but they are organized to make type-directed disambiguation quick. In particular, the main part of a name that a user might type is used as the key, and its values are maps from namespaces to actual values.

The datatype Def represents a definition in the global context. All global names map to this structure.

Type and Term are both synonyms for TT.

Datatypes are represented by a TyDecl with the appropriate NameType. A Function is a global constant term with an annotated type, Operator represents primitives implemented in Haskell, and CaseOp represents ordinary pattern-matching definitions. CaseOp has four versions for different purposes, and all are saved because that’s easiest.

CaseInfo: the tc_dictionary is because it’s a type class dictionary which makes totality checking easier.

The normalise* functions give different behaviors - but normalise is the most common.

normaliseC - “resolved” means with names converted to de Bruijn indices as appropriate.

normaliseAll - reduce everything, even if it’s non-total

normaliseTrace - special-purpose for debugging

simplify - reduce the things that are small - the list argument is the things to not reduce.

Core/Typecheck.hs

Standard stuff. Hopefully no changes are necessary.

Core/Elaborate.hs

Idris definitions are elaborated one by one and turned into the corresponding TT. This is done with a tactic language as an EDSL in the Elab monad (or Elab’ when there’s a custom state).

Lots of plumbing for errors.

All elaboration is relative to a global context.

The string in the pair returned by elaborate is log information.

See JFP paper, but the names don’t necessarily map to each other. The paper is the “idealized version” without logging, additional state, etc.

All the tactics take Raws, typechecking happens there.

claim (x : t) assumes a new x : t.

PLEASE TIDY THINGS UP!

proofSearch flag to try’ is whether the failure came from a human (so fail) or from a machine (so continue)

Idris-level syntax for providing alternatives explicitly: (| x, y, z |) try x, y, z in order, and take the first that succeeds.

Core/ProofState.hs

Core/Unify.hs

Deals with unification. Unification can reply with: - this works - this can never work - this will work if these other unification problems work out (eg unifying f x with 1)

match_unify: same thing as unification except it’s just matching name against name, term against term. x + y matches to 0 + y with x = 0. Used for <== syntax as well as type class resolution.

Idris/AbsSyntaxTree.hs

PTerm is the datatype of Idris syntax. P is for Program. Each PTerm turns into a TT term by applying a series of tactics.

IState is the major interpreter state. The global context is the tt_ctxt field.

Ctxt maps possibly ambiguous names to their referents.

Idris/ElabDecls.hs

This is where the actual elaboration from PTerm to TT happens.

Idris/ElabTerm.hs

build is the function that creates a Raw. All the “junk” is to deal with things like metavars and so forth. It has to remember what names are still to be defined, and it doesn’t yet know the type (filled in by unificaiton later). Also case expressions have to turn into top-level functions.

resolveTC is type class resolution.