Union Types - More Details
Syntax
Syntactically, unions follow the same rules as intersections, but have a lower precedence, see Intersection Types - More Details.
Interaction with pattern matching syntax
|
is also used in pattern matching to separate pattern alternatives and has
lower precedence than :
as used in typed patterns, this means that:
case _: A | B => ...
is still equivalent to:
case (_: A) | B => ...
and not to:
case _: (A | B) => ...
Subtyping Rules
A
is always a subtype ofA | B
for allA
,B
.- If
A <: C
andB <: C
thenA | B <: C
- Like
&
,|
is commutative and associative:A | B =:= B | A A | (B | C) =:= (A | B) | C
&
is distributive over|
:A & (B | C) =:= A & B | A & C
From these rules it follows that the least upper bound (lub) of a set of types is the union of these types. This replaces the definition of least upper bound in the Scala 2 specification.
Motivation
The primary reason for introducing union types in Scala is that they allow us to guarantee that for every set of types, we can always form a finite lub. This is both useful in practice (infinite lubs in Scala 2 were approximated in an ad-hoc way, resulting in imprecise and sometimes incredibly long types) and in theory (the type system of Scala 3 is based on the DOT calculus, which has union types).
Additionally, union types are a useful construct when trying to give types to existing dynamically typed APIs, this is why they're an integral part of TypeScript and have even been partially implemented in Scala.js.
Join of a union type
In some situation described below, a union type might need to be widened to
a non-union type, for this purpose we define the join of a union type T1 | ... | Tn
as the smallest intersection type of base class instances of
T1
,...,Tn
. Note that union types might still appear as type arguments in the
resulting type, this guarantees that the join is always finite.
Example
Given
trait C[+T]
trait D
trait E
class A extends C[A] with D
class B extends C[B] with D with E
The join of A | B
is C[A | B] with D
Type inference
When inferring the result type of a definition (val
, var
, or def
), if the
type we are about to infer is a union type, we replace it by its join.
Similarly, when instantiating a type argument, if the corresponding type
parameter is not upper-bounded by a union type and the type we are about to
instantiate is a union type, we replace it by its join. This mirrors the
treatment of singleton types which are also widened to their underlying type
unless explicitly specified. and the motivation is the same: inferring types
which are "too precise" can lead to unintuitive typechecking issues later on.
Note: Since this behavior limits the usability of union types, it might be changed in the future. For example by not widening unions that have been explicitly written down by the user and not inferred, or by not widening a type argument when the corresponding type parameter is covariant. See #2330 and #4867 for further discussions.
Example
import scala.collection.mutable.ListBuffer
val x = ListBuffer(Right("foo"), Left(0))
val y: ListBuffer[Either[Int, String]] = x
This code typechecks because the inferred type argument to ListBuffer
in the
right-hand side of x
was Left[Int, Nothing] | Right[Nothing, String]
which
was widened to Either[Int, String]
. If the compiler hadn't done this widening,
the last line wouldn't typecheck because ListBuffer
is invariant in its
argument.
Members
The members of a union type are the members of its join.
Example
The following code does not typecheck, because hello
is not a member of
AnyRef
which is the join of A | B
.
trait A { def hello: String }
trait B { def hello: String }
def test(x: A | B) = x.hello // error: value `hello` is not a member of A | B
Exhaustivity checking
If the selector of a pattern match is a union type, the match is considered exhaustive if all parts of the union are covered.
Erasure
The erased type for A | B
is the erased least upper bound of the erased
types of A
and B
. Quoting from the documentation of TypeErasure#erasedLub
,
the erased lub is computed as follows:
- if both argument are arrays of objects, an array of the erased lub of the element types
- if both arguments are arrays of same primitives, an array of this primitive
- if one argument is array of primitives and the other is array of objects, Object
- if one argument is an array, Object
- otherwise a common superclass or trait S of the argument classes, with the
following two properties:
- S is minimal: no other common superclass or trait derives from S
- S is last : in the linearization of the first argument type
|A|
there are no minimal common superclasses or traits that come after S. The reason to pick last is that we prefer classes over traits that way, which leads to more predictable bytecode and (?) faster dynamic dispatch.