January 16th, 2020

Writing SQL for humans with Literate SQL

Article cover photo

In this post, I'll briefly explain Literate Programming and show how you can apply that concept to make your SQL scripts easier to understand (for humans!) and more maintainable.

Literate Programming

Literate programming means writing code primarily for humans instead of for machines.

The first step towards Literate Programming is to switch the comments and code in our source code. That might sound weird at first... 😛 The easiest way to explain this is with an example. Let's start with the following Python script (please don't mind any details, we'll get to SQL soon!):

# The best kind of jokes are Chuck Norris jokes, so we create a function
# to get them from the ICNDB API.

def random_joke():
     '''Get a random Chuck Norris joke from the ICNDB API.'''
     return requests.get('https://api.icndb.com/jokes/random').json()

# Next, we set up a web server to show the jokes to our users.

...

In a literate program, we would write something like this instead:

The best kind of jokes are Chuck Norris jokes, so we create a function
to get them from the ICNDB API.

> def random_joke():
>     '''Get a random Chuck Norris joke from the ICNDB API.'''
>     return requests.get('https://api.icndb.com/jokes/random').json()

Next, we set up a web server to show the jokes to our users.

...

Note that the comments are now regular plain text, and that the code to be executed has been specially marked. As a result, our explanations to humans have taken precedence over the code!

This literate style will not make sense for all kinds of code and might seem a bit silly in this small example, but it can be very useful for programs where explaining your thoughts and approach is more important than the code itself.

The concept of Literate Programming was introduced ages ago (in programming time 🙃) by Donald E. Knuth. Back in 1983 he suggested:

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

Since then, a few programming languages got native support for running literate programs. For example:

Beyond this, 'Notebooks' have become quite popular and are very close to the concept of literate programming:

Literate SQL

SQL is a very expressive language where you often put much thought and testing into writing a terse query. Accordingly, applying literate programming to SQL can make a lot of sense!

I was not able to find any existing tooling that supports Literate Programming in SQL (please let me know if you do!). The Notebook solutions are close and they are fine for querying an existing schema, but they are not a good fit if we want to create a new schema or a whole database application. So let's create our own simple tooling that works for SQL combined with Markdown files.

Time for an example with SQL! Let's convert the following script into a literate program:

-- POSTS TABLE

-- Create a table in which we will track the posts created by our users.

create table posts
    ( post_id  serial primary key
    , title    text
    );

comment on table posts is
    'Posts with a title.';

comment on column posts.title is
    'Title of the post';

Note that we used comments for a few different purposes in this script:

  • Creating a structure or outline for our script (-- POSTS TABLE)
  • Explaining what we are doing and why (-- Create a table...)
  • Attaching comments directly to database objects (comment on .. is ...;).

The comment on ... statements are very similar to docstrings in Python, which can be accessed interactively or used to generate documentation. Likewise, the comments in SQL can be very useful for people exploring our schema (e.g. with the family of \d commands in psql, in GUI tools like DBeaver or via the OpenAPI spec generated by PostgREST).

The Markdown markup language is a good fit for turning this script into a literate SQL file:

  • We can describe our outline using Markdown headers (#, ## etc.)
  • Our explanations can be written in plain text with Markdown markup where needed (e.g. for links)
  • We can put our SQL code into Markdown code blocks.

So we can turn our SQL script from above into the following .md file:

# Posts table

Create a table in which we will track the posts created by our users.

    create table posts
        ( postid serial primary key
        , title text
        );

    comment on table posts is
        'Posts with a title.';

    comment on column posts.title is
        'Title of the post';

This mechanical conversion is only the starting point, of course, and we would now add further explanations, structure and links as needed.

We left the comment on ... statements as they were in the original script, as they are unchangedly useful the users of our schema. Note that we did the same with the docstrings in the Python example that we started with.

Many text editors will automatically provide the right syntax highlighting for the SQL code blocks that we embedded in the Markdown file, for example Visual Studio Code does it out of the box. It would be even better if we used fenced Markdown code blocks, where we can explicitly set the sql language for highlighting (see the full example below).

It's very easy to render the Markdown file into HTML or PDF for documentation and comfortable reading. GitHub and GitLab will, for example, automatically display pretty previews for our literate code.

Complete example of a Literate SQL script

The value of Literate Programming in SQL becomes clearer with larger scripts. The following file is a complete example of a literate SQL script:

app.sql.md

This example is much more verbose than setting up a schema would usually warrant, as it's intended to be a tutorial for new PostgREST users at some point. Please keep that in mind and don't be that be that chatty in your literal SQL files! :-)

Even though this script is used to set up a relatively complex application with multiple schemas, indexes, stored procedures and tests, it should be possible for a new reader to understand how the application works relatively quickly. The option to set links to other parts of the code and to external resources also turned out to be very useful.

But, how can we now run the SQL script that has been 'buried' in that file? We'll solve that small problem in the next section.

Tooling for Literate SQL

Literate SQL is not directly supported by the usual SQL tools like the psql utility from Postgres, but we can easily create our own tooling in order to run literate SQL programs. Short of using a full Markdown parser (e.g. in a Python script), we can use the sed utility that is available on most Unix machines to pull out the SQL code blocks from our Markdown files.

For example, if we wanted to convert a literate SQL/Markdown file app.sql.md into a regular SQL script, we can run:

sed -f md2sql.sed <app.sql.md >app.sql

You can get md2sql.sed here. It's only a two-liner sed-script to which I added many comments, as this is one of the very expressive but cryptic examples that sed is notorious for 😜.

Now that we have turned our literate app.sql.md file into a regular SQL script app.sql, we can run it directly in our database, as we would if we had written a plain SQL script in the first place:

psql -f app.sql

We are using the psql utility from Postgres in this example, but it will work similarly for any other relational database. Let's do everything in one command:

sed -f md2sql.sed app.sql.md | psql

This pulls out the SQL code blocks out of our Markdown file and uses psql to directly run them in our database.

That's all you need to write and use literate SQL scripts! 🎉

Writing SQL for humans

I'm a big fan of tools like PostgREST and of leveraging the full power of relational databases when developing applications. Using a literate style in Postgres applications (like the example for session-based authentication with PostgREST) goes a long way in making the code easier to understand and to validate.

The tooling for literate SQL that we created here is very simple and Markdown is a markup language that most developers are familiar with, so I don't think that there are any significant downsides to using Literate SQL as shown in this post. The upside of having clearer, more maintainable code can, however, be huge!

Question: Would you use literate SQL in your projects?

Please comment, I'm curious to hear your opinion and looking forward to discussing use-cases with you. Let's figure out how to make this as easy and useful as possible!

In upcoming posts, I plan to write about maintaining SQL applications without annoying migrations and how to easily create disposable Postgres databases for development and testing without containers. Please stay tuned!

Remo
Remo
Elm, Nix and Postgres enthusiast.