Sum types in relational databases: Yet another (IMO pretty neat) way to model them in SQL
Sum types
Sum types are an incredibly useful data-modelling feature that is predominantly used in functional programming languages. For example in Elm we can write something like:
type Animal
= Dog { name : String, age : Integer }
| Bird { song : String }
(It's a silly example, I know...)
The type we defined here restricts the possible values of an Animal
to exactly Dog
or Bird
with the respective attributes. Those restrictions can make it much easier to reason about code and to check it for correctness. We can extend the type by as many constructors as we need (anyone up for Cat
?).
Relational models for sum types
So how do we model this kind of type in SQL? There is no perfect solution, but you can find many good suggestions here:
- Sum types in SQL (by Matt Parson)
- Sum types for relational databases (by Dmitry Olshansky and Denis Redozubov)
While modeling a sum type with several normalized tables is arguably cleaner from a relational standpoint (e.g. one central table for Animal
and related ones for Dog
and Bird
), it comes with a lot of complexity.
Yet another way to model sum types in SQL
The solution I want to demonstrate here is the 'nullable columns' approach from the blog posts above, with a more compact formulation of the not null
constraints.
Data definition
create type animal_type
as enum
( 'dog'
, 'bird'
);
create table animals
( animal_id serial primary key
, animal_type animal_type not null
, dog_name text
check ((animal_type = 'dog') = (dog_name is not null))
, dog_age integer
check ((animal_type = 'dog') = (dog_age is not null))
, bird_song text
check ((animal_type = 'bird') = (bird_song is not null))
);
First, we created a new Postgres type in which we exhaustively enumerate all constructors (dog
and bird
in our example).
Then we define a table where the constructor is tracked in the animal_type
column. The fields of the constructors are all listed in the same table. With the constraint check ((animal_type = 'dog') = (dog_name is not null))
, we make sure that the column is not null when the animal_type
is dog
, and that it is always null otherwise - i.e. only valid combinations.
Referring to other columns in a check constraint of a column (as we do here with animal_type
) is non-standard SQL and will not be portable. I opted for it in this case, as having the constraints attached to each column makes it quite obvious that they complete and correct.
Data modification
Inserting values into this table is straightforward:
insert into animals(animal_type, bird_song) values ('bird', 'lalala');
insert into animals(animal_type, dog_name, dog_age) values ('dog', 'rex', 7);
Note that the null columns will not waste any significant amount of space, as Postgres will efficiently pack each record.
The constraints we set will make sure that only valid values can be inserted or updated:
-- This will fail as we also set the `dog_name`attribute, which belongs
-- to another constructor
insert into animals(animal_type, bird_song, dog_name)
values ('bird', 'lalala', 'rex');
-- Fails as the `dog_age` column is missing, but it's required for `dog`
insert into animals(animal_type, dog_name) values ('dog', 'rex');
Data query
Querying the table is simple:
select * from animals;
select animal_id, dog_name, dog_age from animals where animal_type = 'dog';
On the functional side, decoding the rows back into sum types is relatively simple, we can either use a oneOf
construct or something like Decode.string "animal_type" |> Decode.andThen ...
in Elm.
Discussion
There is no perfect way to model sum types in relational databases. Some are very complex to set up and handle, others are more pragmatic and simpler to handle, but less normalized.
The solution proposed here will not bring any joy to normalization affictionadoes, but it is easy to handle and, due to its simplicity, also performs well.
What do you think?