GADTs - Broad overview

Introduction

There are multiple levels to the implementation. They deal with slightly different problems. The most important levels are the following ones:

Figuring out which relationships are necessary (PatternTypeConstrainer)
Breaking down the relationships (TypeComparer)
Recording and bookkeeping of relationships (GadtConstraint)
Looking up the information (TypeComparer, GadtConstraint)

Out of the levels above, 4. is by far the simplest one.

There are also other parts to supporting GADTs. Roughly in order of importance, they are:

At specific points, abstract types are added to GadtConstraint. For instance, when entering a method, we add all type parameters to GadtConstraint.
Casts need to be inserted when GADTs were used for type comparison.
1. TypeComparer keeps track of whether a GADT constraint was used in a mutable variable usedGadt.
GADT constraints need to be stored in attachments to be used in PostTyper.
1. Attachment key is named inferredGadtConstraints.
When we select members on a type that may have GADT constraints, we perform special "healing" by approximating the type using those constraints. We cannot take the constraints into account because member lookup is cached, and GADT constraints are only valid for specific scopes.

Useful widgets

Expr

This is the classical GADT example:

enum Expr[T] {
  case IntLit(value: Int) extends Expr[Int]
  case BoolLit(value: Boolean) extends Expr[Boolean]
  case IfExpr(
    cond: Expr[Boolean],
    when: Expr[T],
    otherwise: Expr[T],
  )
}

EQ

The following enum will result in an equality constraint between S and T if we match on it:

enum EQ[S, T] {
  case Refl[U]() extends EQ[U, U]
}

SUB

The following enum will result in a subtyping constraint S <: T if we match on it:

enum SUB[-S, +T] {
  case Refl[U]() extends SUB[U, U]
}

Details of above

What abstract types can have GADT constraints

Right now, we record GADT constraints for:

function/method type parameters
class type parameters

There is a branch on the way which will also record them for type members (so path-dependent types) and singleton types. It has a paper associated: "Implementing path-dependent GADTs for Scala 3".

What are necessary relationships? Any examples?

Covariance means no constraint is necessary

Standard (non-case) classes allow "strange" inheritance which means that we cannot infer any information from covariant type parameters.

class Expr[+A]
class IntList extends Expr[List[Int]]

def foo[T](e: Expr[List[T]]): T =
  e match {
    case _ : IntList =>
      // e : Expr[List[Int]]
      // T <: Int
      0
  }

class Weird(list: List[String]) extends IntList with Expr[Nothing]

Case classes have a special check which disallows inheritance like Weird. This means we can infer extra information from them.

Breaking down the constraints

class Expr[A]
class IntList extends Expr[List[Int]]

def foo[T](e: Expr[List[T]]): T =
  e match {
    case _ : IntList =>
      // Level 1:
      //   We start with e : Expr[List[T]]
      //   We check that e : IntList <: Expr[List[Int]
      //   Expr is invariant,
      //     so we have List[Int] <: List[T] , List[T] <: List[Int]
      // Level 2:
      //   We compare List[Int] <: List[T]
      //   We record  Int <: T
      //   We compare List[T] <: List[Int]
      //   We record  T <: Int
      0
  }

Relation betweeen GadtConstraint and OrderingConstraint

Internal and external types

GadtConstraint uses OrderingConstraint as the datastructure to record information about GADT constraints.

OrderingConstraint only supports working with TypeParamRefs.

GadtConstraint wants to record information about things other than TypeParamRefs.

To solve this, GadtConstraint internally creates TypeParamRefs which it adds to the OrderingConstraint it keeps internally. Whenever a GADT constraint is added, we "internalize" the type by replacing all the external types with the internal TypeParamRefs. Whenever we take bound information out of the GADT constraint, we need to "externalize" the types by replacing the internal TypeParamRefs with their external versions. To implement this, GadtConstraint keeps a bidirectional mapping between the external types and the internal TypeParamRefs.

The TypeParamRefs and TypeVars registered in one constraint cannot ever be present in types mentioned in the other type constraint. The internal TypeParamRefs and TypeVars cannot ever leak out of the GadtConstraint. We cannot ever record a bound in GadtConstraint which mentions TypeParamRefs used for type inference. (That part is ensured by the way TypeComparer is organised – we will always try to record bounds in the "normal" constraint before recording a GADT bound.)

Other details

TypeComparer approximations

TypeComparer sometimes approximates the types it compares. Let's see an example based on these definitions:

class Expr[T]
class IntList extends Expr[Int]

when comparing if IntList <: Expr[Int], TypeComparer will approximate IntList to Expr[Int]. Then it will compare Expr[Int] <: Expr[Int] with appropriate variables set.

The variables which TypeComparer sets are approxState and frozenGadt.

Necessary/sufficient either

TypeComparer sometimes needs to approximate some constraints, specifically when dealing with intersection and union types. The way this approximation works changes if we're currently inferring GADT constraints. This is hopefully documented well in TypeComparer in doc comments for necessaryEither and sufficientEither.

Types bound in patterns

(list : List[Int]) match {
  case lst : List[a] =>
    // a is a new type bound in the pattern.
    // We do not record any information about a.
    // We should know that a <: Int.
    // (Or it's fine to just have a =:= Int.)
    // We would not have this issue if we used a custom unapply.
    // Type case patterns create a fresh symbol even if they shouldn't.
    // (See indexPattern in Typer.)
}

Internal structure of OrderingConstraint

Imagine we have two type parameters in scope, A and B.

We could record the following in ctx.gadt:

A <: Int
B <: A

Then, we expect that calling ctx.gadt.bounds(`B`) will return `<: Int`.

In order to handle this, GadtConstraint relies on OrderingConstraint. Internally, it will represent the above constraints as follows:

A <: Int
B <: Int
B <: A

The first two constraints are "entries" – they are easy to look up whenever we ask for bounds of A or B. The third constraint is an ordering – it helps with correctly propagating the bounds we record.

Possible broad improvements

Allow OrderingConstraint to record bounds for things other than TypeParamRefs

This would mean we no longer need to keep the bidirectional mapping in GadtConstraint.

Not mixing OrderingConstraint and ConstraintHandling in GadtConstraint

GadtConstraint right now mixes OrderingConstraint and ConstraintHandling. The first one is supposed to be the immutable constraint datastructure. The second one implements mutable functionality around a variable containing the immutable datastructure.

GadtConstraint mixes them both. Things would be better organised if GadtConstraint was split like the normal constraint.

Creating a separate TypeComparer for breaking down types into GADT constraints

TypeComparer is biased towards one specific way of approximating constraints. When we infer types, it's ok to be "optimistic". When inferring GADT constraints, we should be as pessimistic as possible, in order to only infer constraints which are necessary.

TypeComparer was originally written to support type inference and GADTs were only added later on. This means that the "default" way TypeComparer approximates wasn't even noticeable before, but when inferring GADT constraints could result in inferring unsound information.

We could potentially fix this by creating a special TypeComparer which would only "break down" subtype relationships to record GADT constraints.