Dotty Internals 1: Trees & Symbols (Meeting Notes)

These are meeting notes for the Dotty Internals 1: Trees & Symbols talk by Dmitry Petrashko on Mar 21, 2017.

Entry point

dotc/Compiler.scala

The entry point to the compiler contains the list of phases and their order.

Phases

Some phases executed independently, but others (miniphases) are grouped for efficiency. See the paper "Miniphases: Compilation using Modular and Efficient Tree Transformation" for details.

Trees

dotc/ast/Trees.scala

Trees represent code written by the user (e.g. methods, classes, expressions). There are two kinds of trees: untyped and typed.

Unlike other compilers (but like scalac), dotty doesn't use multiple intermediate representations (IRs) during the compilation pipeline. Instead, it uses trees for all phases.

Dotty trees are immutable, so they can be shared.

Untyped trees

dotc/ast/untpd.scala

These are the trees as output by the parser.

Some trees only exist as untyped: e.g. WhileDo and ForDo. These are desugared by the typechecker.

Typed trees

dotc/ast/tpd.scala

Typed trees contain not only the user-written code, but also semantic information (the types) about the code.

Notes on some tree types

ThisTree

Tree classes have a ThisTree type field which is used to implement functionality that's common for all trees while returning a specific tree type. See withType in the Tree base class, for an example.

Additionally, both Tree and ThisTree are polymorphic so they can represent both untyped and typed trees.

For example, withType has signature def withType(tpe: Type)(implicit ctx: Context): ThisTree[Type]. This means that withType can return the most-specific tree type for the current tree, while at the same time guaranteeing that the returned tree will be typed.

Creating trees

You should use the creation methods in untpd.scala and tpd.scala to instantiate tree objects (as opposed to creating them directly using the case classes in Trees.scala).

Meaning of trees

In general, the best way to know what a tree represents is to look at its type or denotation; pattern matching on the structure of a tree is error-prone.

Errors

dotc/typer/ErrorReporting.scala

Sometimes there's an error during compilation, but we want to continue compilling (as opposed to failing outright), to uncover additional errors.

In cases where a tree is expected but there's an error, we can use the errorTree methods in ErrorReporting to create placeholder trees that explicitly mark the presence of errors.

Similarly, there exist ErrorType and ErrorSymbol classes.

Assignment

The closest in Dotty to what a programming language like C calls an "l-value" is a RefTree (so an Ident or a Select). However, keep in mind that arbitrarily complex expressions can appear in the lhs of an assignment: e.g.

trait T {
  var s = 0
}
{
  class T2 extends T
  while (true) 1
  new Bla
}.s = 10

Another caveat, before typechecking there can be some trees where the lhs isn't a RefTree: e.g. (a, b) = (3, 4).

Symbols

dotc/core/Symbols.scala

Symbols are references to definitions (e.g. of variables, fields, classes). Symbols can be used to refer to definitions for which we don't have ASTs (for example, from the Java standard library).

NoSymbol is used to indicate the lack of a symbol.

Symbols uniquely identify definitions, but they don't say what the definitions mean. To understand the meaning of a symbol we need to look at its denotation (spefically for symbols, a SymDenotation).

Symbols can not only represent terms, but also types (hence the isTerm/isType methods in the Symbol class).

ClassSymbol

ClassSymbol represents either a class, or an trait, or an object. For example, an object

object O {
  val s = 1
}

is represented (after Typer) as

class O$ { this: O.type =>
  val s = 1
}
val O = new O$

where we have a type symbol for class O$ and a term symbol for val O. Notice the use of the selftype O.type to indicate that this has a singleton type.

SymDenotation

dotc/core/SymDenotations.scala

Symbols contain SymDenotations. The denotation, in turn, refers to: