Learning Languages From ABCs to SQL
source link: https://www.tuicool.com/articles/hit/AniEJvV
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Learning Languages From ABCs to SQL
A mini-dive into SQL queries beyond the basic SELECT and FROM commands.
If you can get past the cheesiness of my title I’d like to teach you some SQL querying commands beyond the basic SELECT
and FROM
commands you might see as requirements on a job description. I will be writing code you can use to query the StackOverflow dataset via Google BigQuery Public Datasets . You can query this data up to 1TB per month for free.
For the example queries below we will be working with the users
and badges
tables. All of the queries below will contain SELECT
and FROM
commands, so the section headers will denote when we add new commands to our queries.
SELECT, FROM, LIMIT
SELECT * FROM `bigquery-public-data.stackoverflow.users` LIMIT 5;
Result: Returns all columns *
from the users
table and limits it to five rows.
SELECT id, display_name FROM `bigquery-public-data.stackoverflow.users` LIMIT 5;
Result: Returns the id
and display_name
columns from the users
table limiting it to five rows.
SELECT u.id, u.display_name FROM `bigquery-public-data.stackoverflow.users` AS u LIMIT 5;
Result: This is the same query as the one before it.
We now see u.
prepended to our columns and AS u
added to our second line of code. This is called namespacing; and with this we can give our tables a temporary alias. Prefixing our columns with an alias of our table becomes useful when querying from multiple tables which we will see later. It is also worth noting that the second line of code will work without AS
, like so: FROM `bigquery-public-data.stackoverflow.users` u
.
DISTINCT, ORDER BY
SELECT DISTINCT u.id, u.display_name as name FROM `bigquery-public-data.stackoverflow.users` u ORDER BY u.display_name ASC LIMIT 5;
Result: Returns five, unique rows ( DISTINCT
) of the id
and display_name
columns from the users
table. It sorts them in ascending order (or alphabetically) by the display_name
column. We can also see that the display_name
column has been temporarily aliased as name
.
WHERE, AND, OR
SELECT * FROM `bigquery-public-data.stackoverflow.users` u WHERE u.up_votes >= 500 AND u.reputation >= 100 LIMIT 10;
AND
in line 3 can be replaced with OR
, below I explain the results with either.
Result with AND
: Returns all the columns (*) from the users
table, limiting it to ten results, where the number of upvotes is equal to 500 or more and the reputation score is equal to 100 or more.
Result with OR
: Returns all the columns (*) from the users
table, limiting it to ten results, where the number of upvotes is equal to 500 or more or the reputation score is equal to 100 or more.
LIKE, ILIKE
SELECT u.display_name, u.location FROM `bigquery-public-data.stackoverflow.users` u WHERE u.location LIKE 'United%' LIMIT 10;
Result: Returns the display_name and location columns from the users
table, limiting it to ten rows, where the location name begins with “United”. So we’d see users who live in United States, United Kingdom, etc. in our results. Alternatively, WHERE u.location LIKE ‘%United’
will return users where location name ends with United. So we’d likely see users with typos in the location field like “States United”.
Four things to note:
- The percentage signs (%) are wildcards
- You can negate a
LIKE
withNOT LIKE
-
LIKE
is case sensitive so capitalization matters -
ILIKE
is a case insensitiveLIKE
so capitalization does not matter
INNER JOIN
It is common to need to combine data from multiple tables in one query. For instance, we might want to run a query returning XXX. We can achieve this by joining the tables. There are several types of joins, but the most common one is an inner join as seen below.
SELECT b.name, b.class, u.display_name FROM `bigquery-public-data.stackoverflow.badges` b INNER JOIN `bigquery-public-data.stackoverflow.users` u ON b.user_id = u.id WHERE b.class = 1 LIMIT 5;
Result: Returns the name and class of badges with the associated display name of the user where the class is equal to 1 and limits results to five.
GROUP BY, AVG
Sometimes we will need to group our data and then run an aggregate function. Some frequently used aggregate functions are COUNT
, MIN
, and MAX
— we focus on AVG
in the example below.
SELECT b.name, AVG(u.reputation) FROM `bigquery-public-data.stackoverflow.badges` b INNER JOIN `bigquery-public-data.stackoverflow.users` u ON b.user_id = u.id GROUP BY b.name LIMIT 5;
Result: Answers this question — What is the average user reputation score per badge?
I hope this guide can be a useful resource for getting you comfortable making SQL queries. To learn more about the language I recommend checking out this great resource by Connor Leech. Let me know if you have any questions, what you liked/disliked, or want to see more of in the comments below!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK