7

You Should Use This to Visualize SQL Joins Instead of Venn Diagrams

 2 years ago
source link: https://towardsdatascience.com/you-should-use-this-to-visualize-sql-joins-instead-of-venn-diagrams-ede15f9583fc
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Data Science

You Should Use This to Visualize SQL Joins Instead of Venn Diagrams

Venn diagrams are so last year

Image by author, inspired by R for Data Science

A couple weeks ago I published an article about SQL Anti-Joins on Reddit. Not long after I shared it, I got this response:

Image by author

This piqued my interest since I hadn’t read or heard of anyone who thought Venn diagrams were a bad way to visualize SQL joins up to this point, and I’ve been coding in SQL constantly for over 3 years. Personally, I think Venn diagrams have been helpful in quickly remembering and visualizing the types of joins between two tables. So this was my response:

Image by author

Following that comment, I got some passionate responses and realized this has been debated extensively already and has some history behind it. As I was reading more about it, I found a popular Reddit post titled, “Say NO to Venn Diagrams When Explaining Joins”. It was interesting reading through others opinions about it. I also found a related popular post published in Towards Data Science 2 years ago titled, “ Can we stop with the SQL JOINs venn diagrams insanity?”.

This debate reminds me of the arguments on how to pronounce SQL or when I first heard about the tabs vs. spaces debate. I decided to write this article after I thought about the arguments from both sides and then found what I think is an underrated visualization for SQL joins that I am calling the checkered flag diagram. 🏁

Quick side note: In regards to the pronunciation of SQL, SQL was originally spelled ‘SEQUEL’ and was only changed to ‘SQL’ due to a trademark issue.

Despite having my own opinion on the topic, I do think it’s worth stating that I believe that those on both sides of the argument have some valid points and that these visualizations are both valid ways to represent SQL joins. The reason this debate exists is because both sides have seen benefits from different learning approaches and that is ok. Sure, there might be an optimal learning path for the majority of people, but learning is a tailored experience and so I don’t want to discount the benefits others have found by using different visualizations. Please remember though, that the best way to truly understand SQL joins is to get in the code and practice! SQL Practice.com is a good resource that I have found to practice SQL.

Please remember though, that the best way to truly understand SQL joins is to get in the code and practice!

That being said, I hope to address the key points from both sides in the “Great SQL Venn diagram debate” and propose a solution that might, just maybe, appease people from both sides.

Key Points from the Opposing Sides of the Argument

In attempt to understand both sides better, I read quite a few opinions on Reddit as well as some articles. Here are the reasons I found people did not agree with using Venn diagrams for visualizing SQL joins:

  • Venn diagrams originate from set theory and therefore shouldn’t be used to visualize joins between two relational tables
  • Some people claim students have a difficult time understanding joins when introduced to the concept using Venn diagrams
  • Venn diagrams aren’t a technically correct representation of what a join is actually doing
  • Venn diagrams have various limitations: i.e. can’t visualize other join types (e.g. cross joins) very well, can’t show what happens when duplicates occur, etc…

Those are the main criticisms I found from those who are against using Venn diagrams. Those in favor of SQL Venn diagrams responded with two main points:

  • While Venn diagrams may not technically be correct, they help people to remember the types of joins and are simpler to understand
  • Joins and set operations can result in the exact same result depending on the columns selected

Regardless of which side you agree with more, you now have some context for the reason I decided to write this article.

One Alternate Solution to Venn Diagrams

There was a popular article in 2016 that also opposed using Venn diagrams and the author proposed an alternate diagram called a ‘join diagram’. Here is an example of an inner join visualized as a join diagram:

Image by author, inspired by Jooq Blog

This diagram is beneficial since it represents the tabular structure used in SQL joins more accurately than Venn diagrams. The issue with this diagram is that it shows the primary keys as colors, but then also has either numbers or letters within those colors. The letters and numbers inside the rectangles are supposed to represent columns besides the primary key column (represented by colors), but this is where this visualization starts to break down. Representing multiple columns using one rectangle can be confusing and unintuitive (at least to me). Regardless, this visualization seems to be helpful to some people that had difficulty understanding joins using SQL. There are definitely limitations to every visualization.

The Checkered Flag Diagram 🏁

As I was reviewing different ways to visualize SQL joins, I found my personal favorite. I am hoping this diagram could bridge the gap between the two sides or at least provide another option to help people understand SQL joins.

This diagram was originally created by Hadley Wickham & Garrett Grolemund in their book “R for Data Science”. The diagrams can be found in the chapter on ‘Relational data’.

I recreated these diagrams in a cheatsheet that I show down below, but I also created a Github repository so you can download the images here.

Image by author, inspired by R for Data ScienceImage by author, inspired by R for Data Science

Why I 😍 the Checkered Flag Diagram

Here are all of the reasons I prefer this diagram:

  • Represents a join more accurately than a join diagram since it has primary keys that have identical colors and numbers
  • Shows 1 additional column of values for each table to help visualize what happens to data in columns in addition to the primary key columns
  • Connecting lines help simplify the visual making it easy to see joining rows
  • Similar to a join diagram, an output table resulting from the join is shown to the right
  • Null values are shown when applicable, which is exactly what occurs when executing joins in SQL
  • A cross join can be shown, which is an advantage when compared to Venn diagrams
  • SQL syntax is shown for reference, similar to the Venn diagram cheatsheet here

I still believe that Venn diagrams are useful for visualizing SQL joins, but they are limited in the range and accuracy of what they can represent. Hopefully these checkered flag diagrams can be a great reference as you learn SQL. Thanks for reading!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK