|
| 1 | +# Gradual set-theoretic types |
| 2 | + |
| 3 | +Elixir is in the process of incorporating set-theoretic types into the compiler. This document outlines the current stage of our implementation. |
| 4 | + |
| 5 | +The current milestone aims to infer types from patterns and guards and use them to type check programs, enabling the Elixir compiler to find faults and bugs in codebases without requiring changes to existing software. The underlying principles, theory, and roadmap of our work have been outlined in ["The Design Principles of the Elixir Type System" by Giuseppe Castagna, Guillaume Duboc, José Valim](https://arxiv.org/abs/2306.06391). |
| 6 | + |
| 7 | +## Supported types |
| 8 | + |
| 9 | +At the moment, Elixir developers will interact with set-theoretic types only through warnings found by the type system. All data types in the language are modelled: |
| 10 | + |
| 11 | + * `binary()`, `integer()`, `float()`, `pid()`, `port()`, `reference()` - these types are indivisible. This means both `1` and `13` get the same `integer()` type. |
| 12 | + |
| 13 | + * `atom()` - it represents all atoms and it is divisible. For instance, the atom `:foo` and `:hello_world` are also valid (distinct) types. |
| 14 | + |
| 15 | + * `map()` and structs - maps can be "closed" or "open". Closed maps only allow the specified keys, such as `%{key: atom(), value: integer()}`. Open maps support any other keys in addition to the ones listed and their definition starts with `...`, such as `%{..., key: atom(), value: integer()}`. Structs are closed maps with the `__struct__` key. |
| 16 | + |
| 17 | + * `tuple()`, `list()`, and `function()` - currently they are modelled as indivisible types. The next Elixir versions will also introduce fine-grained types here. |
| 18 | + |
| 19 | +## Set operations |
| 20 | + |
| 21 | +We can compose set-theoretic types by using set operations (hence the name). For example, to say a function returns either atoms or integers, one could write: `atom() or integer()`. |
| 22 | + |
| 23 | +Intersections are available via the `and` operator, such as `atom() and integer()`, which in this case it becomes the empty set `none()`. `term()` is the union of all types, also known as the "top" type. |
| 24 | + |
| 25 | +Intersections are useful when modelling functions. For example, imagine the following function: |
| 26 | + |
| 27 | +```elixir |
| 28 | +def negate(x) when is_integer(x), do: -x |
| 29 | +def negate(x) when is_boolean(x), do: not x |
| 30 | +``` |
| 31 | + |
| 32 | +If you give it an integer, it negates it. If you give it a boolean, it negates it. |
| 33 | + |
| 34 | +We can say this function has the type `(integer() -> integer())` because it is capable of receiving an integer and returning an integer. In this case, `(integer() -> integer())` is a set that represents all functions that can receive an integer and return an integer. Even though this function can receive other arguments and return other values, it is still part of the `(integer() -> integer())` set. |
| 35 | + |
| 36 | +This function also has the type `(boolean() -> boolean())`, because it receives the booleans and returns booleans. Therefore, we can say the overall type of the function is `(integer() -> integer()) and (boolean() -> boolean())`. The intersection means the function belongs to both sets. |
| 37 | + |
| 38 | +At this point, some may ask, why not a union? As a real-world example, take a t-shirt with green and yellow stripes. We can say the t-shirt belongs to the set of "t-shirts with green color". We can also say the t-shirt belongs to the set of "t-shirts with yellow color". Let's see the difference between unions and intersections: |
| 39 | + |
| 40 | + * `(t_shirts_with_green() or t_shirts_with_yellow())` - contains t-shirts with either green or yellow, such as green, green and red, green and blue, yellow, yellow and red, etc. |
| 41 | + |
| 42 | + * `(t_shirts_with_green() and t_shirts_with_yellow())` - contains t-shirts with both green and yellow (and also other colors) |
| 43 | + |
| 44 | +Since the t-shirt has both colors, we say it belongs to the intersection of both sets. The same way that a function that goes from `(integer() and integer())` and `(boolean() -> boolean())` is also an intersection. In practice, it does not make sense to define the union of two functions in Elixir, so the compiler will always point to the right direction. |
| 45 | + |
| 46 | +Finally, we can also negate types by using `not`. For example, to express all atoms, except the atoms `:foo` and `:bar`, one can write: `atom() and not (:foo or :bar)`. |
| 47 | + |
| 48 | +## The `dynamic()` type |
| 49 | + |
| 50 | +Elixir is a functional dynamic programming language. Existing Elixir programs do not have type declarations, but we still want to be able to type check them. This is done with the introduction of the `dynamic()` type. |
| 51 | + |
| 52 | +When Elixir sees the following function: |
| 53 | + |
| 54 | +```elixir |
| 55 | +def negate(x) when is_integer(x), do: -x |
| 56 | +def negate(x) when is_boolean(x), do: not x |
| 57 | +``` |
| 58 | + |
| 59 | +Elixir type checks it as if the function had the type `(dynamic() -> dynamic())`. We say `dynamic()` is a gradual type, which leads us to *gradual set-theoretic types*. |
| 60 | + |
| 61 | +The simplest way to reason about `dynamic()` in Elixir is that it is a range of types. If you have a type `atom() or integer()`, the underlying code needs to work with both `atom() or integer()`. For example, if you call `Integer.to_string(var)`, and `var` has type `atom() or integer()`, the type system will emit a warning, because `Integer.to_string/1` does not accept atoms. |
| 62 | + |
| 63 | +However, by intersecting any type with `dynamic()`, we make the type gradual and therefore only a subset of the type needs to be valid. For instance, if you call `Integer.to_string(var)`, and `var` has type `dynamic() and (atom() or integer())`, the type system will not emit a warning, because `Integer.to_string/1` works with at least one of the types. This flexibility makes `dynamic()` excellent for typing existing code, because it will only emit warnings once it is certain the code will fail. For convenience, most programs will write `dynamic(atom() or integer())` instead of the intersection. They have the same behaviour. |
| 64 | + |
| 65 | +Once Elixir introduces a type language into its API, Elixir programs will behave as a statically typed language, unless the `dynamic` type is used. This brings us to one last remark about dynamic types in Elixir: dyamic types are always at the root. For example, when you write a tuple of type `{:ok, dynamic()}`, Elixir will rewrite it to `dynamic({:ok, term()})`. While this has the downside that you cannot make part of a tuple/map/list gradual, only the whole tuple/map/list, it comes with the upside that dynamic is always explicitly at the root, making it harder to accidentally sneak `dynamic()` in a statically typed program. |
| 66 | + |
| 67 | +## Roadmap |
| 68 | + |
| 69 | +The current milestone is to implement type inference and type checking of Elixir programs without changes to the Elixir language. At this stage, we want to collect feedback on the quality of error messages and performance, and therefore the type system has no user facing API. |
| 70 | + |
| 71 | +If the results are satisfactory, the next milestone will include a mechanism for defining typed structs. Elixir programs frequently pattern match on structs, which reveals information about the struct fields, but it knows nothing about their respective types. By propagating types from structs and their fields throughout the program, we will increase the type system’s ability to find errors while further straining our type system implementation. Proposals including the required changes to the language surface will be sent to the community once we reach this stage. |
| 72 | + |
| 73 | +The third milestone is to introduce the type annotations for functions. Once we conclude this stage, the existing typespecs system will be phased out of the language and moved into a separate library. |
| 74 | + |
| 75 | +## Acknowledgements |
| 76 | + |
| 77 | +The type system was made possible thanks to a partnership between [CNRS](https://www.cnrs.fr/) and [Remote](https://remote.com/). The research was partially supported by [Supabase](https://supabase.com/) and [Fresha](https://www.fresha.com/). The development work is sponsored by [Fresha](https://www.fresha.com/), [Starfish*](https://starfish.team/), and [Dashbit](https://dashbit.co/). |
0 commit comments