18

Recursive SQL Queries with PostgreSQL

 4 years ago
source link: https://towardsdatascience.com/recursive-sql-queries-with-postgresql-87e2a453f1b?gi=e6735ff08500
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Common Table Expression is lesser-known feature of SQL, that makes it possible to write recursive queries. Let’s explore it with PostgreSQL!

Mar 16 ·6min read

Nr2MRrM.jpg!web

Photo by Ludde Lorentz on Unsplash

When working with databases, most of the time, all you need is SELECT , UPDATE (CRUD operations), few JOIN s and WHERE clauses and that's about it. But, sometimes you will run into datasets and tasks, that will require you to use much more advanced features of SQL. One of them is CTE or common table expression and more specifically, Recursive CTE , which we can use to make recursive SQL queries. Let's look at how it works and what we can do with it!

Why Recursive Queries?

First of all, why is this actually even a thing? What are recursive queries good for? — In general recursive queries come in handy when working with self-referential data or graph/tree-like data structures. Just a few examples of these use cases are:

Self-referential data:

  • Manager -> Subordinate (employee) relationship
  • Category -> Subcategory -> Product relationship
  • Graphs — Flight (Plane Travel) map

Trees:

  • Any taxonomy system — books, animals, genetics…
  • Links between articles — for example on Wikipedia
  • Comment section — for example threads on Reddit

For any of these examples, it would require you to build temporary tables or use cursors to actually get useful data from such datasets. Now that I hopefully persuaded you that recursive queries are pretty useful, let’s see what they are and how we can use them.

What Will We Need?

Recursive queries leverage something called CTE — Common Table expression , which is SQL WITH clause, more commonly used for simplifying very complex queries (subqueries). Let's look at an example of that:

This is a very simple example, but you can surely imagine how this can be used to make very complicated queries containing multiple subqueries much more readable.

Apart from the syntax above, we will also need some data that we can use for our recursive queries. For that we will use Manager — Subordinate hierarchy using one self-referencing table, defined like so:

i2MFfuY.png!web

Employee Table

Here’s SQL to create such a table including some data to play with:

So, with that out of the way, let’s build some recursive queries!

To iterate is human, to recur, divine — L. Peter Deutsch

Recursion

Let’s start with a formal structure of recursive SQL query:

It looks pretty simple, but doesn’t tell us much, so let’s see a human-readable example:

And the output of that query:

In this example, our non-recursive base case is just SELECT 1 and recursive part is SELECT n+1 FROM nums WHERE n+1 <= 10 . This part creates recursion by referring to name of CTE ( nums ) and unioning all the results. At the end we have WHERE clause to terminate the recursion so that our query doesn't run infinitely.

Real World Examples

The previous section showed a very basic example that explains how it works but doesn’t really show how the recursive CTE can help us. So, let’s go through some examples using Manager — Subordinate hierarchy data from above:

Getting “manager tree” for some employee:

This query can be used to generate a list of all subordinates of some manager, which essentially is subtree starting from some specific employee. As a base case here, we select one employee (manager) by ID and we also include level to indicate how many levels down we went in the tree. In the recursive part we JOIN employees table to CTE ( managers ) using manager IDs. Again, we include level by incrementing level of results from the previous step of recursion. This is the output:

To better visualize what happens there, look at the following tree. The highlighted part is what is returned by query:

VVj2U3z.png!web

Another practical example using same data is computing degrees of separation between 2 employees :

The recursive CTE is quite similar to the one in the previous example. To simplify things, we only select ID and degree (instead of level). Next, we need a query that looks up and tells us degrees of separation for 2 employees - to do so, we join employees table to our previously built rec table containing IDs and degrees using IDs in respective tables. In the final SELECT part we do some formatting to get nicer output and in WHERE clause we optionally specify the other (second) employee for whom we want the degrees of separation. Output:

Again, playing with the same data, let’s find out what might be the job progression in the hypothetical company:

This time we run the recursion from the bottom up. This is achieved by flipping the ON clause, compared to previous examples. To create readable output, we use STRING_AGG function, which takes rows from above SELECT and concatenates them using > as a delimiter. This is what the query gives us:

Recursion on Graphs

Enough of that employee data set, let’s explore what we can to with graphs — graphs as a data structure are an ideal candidate for traversing using recursion. So, let’s first define a simple table and insert some data, so we have something to play with:

6RJvquI.png!web

Graph

As an example, we will do the obvious thing — find paths as graph. For that, we will need a PostgreSQL special — ARRAY data type. Arrays are not a common relational database feature but are very handy here for storing paths. If you are not familiar with this data type, then these are the things you need for understanding SQL below:

  • Create array — ARRAY[value_1, value_2]
  • Concatenate array — ARRAY[value_1, value_2] || ARRAY[value_3]
  • ALL function - "x" != ALL(ARRAY["a", "b", "c"]) - compares "x" to each value in array, results is true if all comparisons result in true

Alright, now for the query:

The query above first selects all the direct neighbours in the non-recursive case. Then in recursive part, we build on the base case (and during subsequent iterations on previous recursive cases) and append available edge to existing paths and replace destination with the same edge. We do this for every row where the end of the existing path ( dest ) is same as the start of the selected edge and where the new destination is not yet in the path (to prevent cycles). And this is the full output:

Aside from the useful things above, you can also use recursive CTE with infinite streams of data:

The recursive table is built iteratively with each step and each step produces a finite number of rows, therefore it’s possible to use LIMIT to retrieve first n rows from infinite query!

Bonus: Recursive VIEW

As a quick little bonus, I want to also show, that it is possible (as of PostgreSQL 9.3) to even create recursive views:

Even though this is just syntax sugar, it can make your life easier when working with some of recursive queries and data.

Conclusion

Recursion is a very powerful tool, that can solve some problems in a very elegant way. Using it with SQL is not so common though, so hopefully, this article showed you some ways for how to leverage it, in case you run into data that can only be processed recursively.

Recursion, however, can make your code less readable, so use it sparingly, especially with SQL, as it is often not that easy parse and understand even without recursion.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK