23

Learning Languages From ABCs to SQL

 5 years ago
source link: https://www.tuicool.com/articles/hit/AniEJvV
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Learning Languages From ABCs to SQL

A mini-dive into SQL queries beyond the basic SELECT and FROM commands.

EbuAf2j.jpg!webe2EZjyb.jpg!web
Photo via  Pexels

If you can get past the cheesiness of my title I’d like to teach you some SQL querying commands beyond the basic SELECT and FROM commands you might see as requirements on a job description. I will be writing code you can use to query the StackOverflow dataset via Google BigQuery Public Datasets . You can query this data up to 1TB per month for free.

For the example queries below we will be working with the users and badges tables. All of the queries below will contain SELECT and FROM commands, so the section headers will denote when we add new commands to our queries.

SELECT, FROM, LIMIT

SELECT *
FROM `bigquery-public-data.stackoverflow.users`
LIMIT 5;

Result: Returns all columns * from the users table and limits it to five rows.

SELECT id, display_name
FROM `bigquery-public-data.stackoverflow.users`
LIMIT 5;

Result: Returns the id and display_name columns from the users table limiting it to five rows.

SELECT u.id, u.display_name
FROM `bigquery-public-data.stackoverflow.users` AS u
LIMIT 5;

Result: This is the same query as the one before it.

We now see u. prepended to our columns and AS u added to our second line of code. This is called namespacing; and with this we can give our tables a temporary alias. Prefixing our columns with an alias of our table becomes useful when querying from multiple tables which we will see later. It is also worth noting that the second line of code will work without AS , like so: FROM `bigquery-public-data.stackoverflow.users` u .

DISTINCT, ORDER BY

SELECT DISTINCT u.id, u.display_name as name
FROM `bigquery-public-data.stackoverflow.users` u
ORDER BY u.display_name ASC
LIMIT 5;

Result: Returns five, unique rows ( DISTINCT ) of the id and display_name columns from the users table. It sorts them in ascending order (or alphabetically) by the display_name column. We can also see that the display_name column has been temporarily aliased as name .

WHERE, AND, OR

SELECT *
FROM `bigquery-public-data.stackoverflow.users` u
WHERE u.up_votes >= 500 AND u.reputation >= 100
LIMIT 10;

AND in line 3 can be replaced with OR , below I explain the results with either.

Result with AND : Returns all the columns (*) from the users table, limiting it to ten results, where the number of upvotes is equal to 500 or more and the reputation score is equal to 100 or more.

Result with OR : Returns all the columns (*) from the users table, limiting it to ten results, where the number of upvotes is equal to 500 or more or the reputation score is equal to 100 or more.

LIKE, ILIKE

SELECT u.display_name, u.location 
FROM `bigquery-public-data.stackoverflow.users` u
WHERE u.location LIKE 'United%'
LIMIT 10;

Result: Returns the display_name and location columns from the users table, limiting it to ten rows, where the location name begins with “United”. So we’d see users who live in United States, United Kingdom, etc. in our results. Alternatively, WHERE u.location LIKE ‘%United’ will return users where location name ends with United. So we’d likely see users with typos in the location field like “States United”.

Four things to note:

  • The percentage signs (%) are wildcards
  • You can negate a LIKE with NOT LIKE
  • LIKE is case sensitive so capitalization matters
  • ILIKE is a case insensitive LIKE so capitalization does not matter

INNER JOIN

It is common to need to combine data from multiple tables in one query. For instance, we might want to run a query returning XXX. We can achieve this by joining the tables. There are several types of joins, but the most common one is an inner join as seen below.

SELECT b.name, b.class, u.display_name
FROM `bigquery-public-data.stackoverflow.badges` b
INNER JOIN `bigquery-public-data.stackoverflow.users` u
ON b.user_id = u.id
WHERE b.class = 1 
LIMIT 5;

Result: Returns the name and class of badges with the associated display name of the user where the class is equal to 1 and limits results to five.

GROUP BY, AVG

Sometimes we will need to group our data and then run an aggregate function. Some frequently used aggregate functions are COUNT , MIN , and MAX — we focus on AVG in the example below.

SELECT b.name, AVG(u.reputation)
FROM `bigquery-public-data.stackoverflow.badges` b
INNER JOIN `bigquery-public-data.stackoverflow.users` u
ON b.user_id = u.id 
GROUP BY b.name
LIMIT 5;

Result: Answers this question — What is the average user reputation score per badge?

I hope this guide can be a useful resource for getting you comfortable making SQL queries. To learn more about the language I recommend checking out this great resource by Connor Leech. Let me know if you have any questions, what you liked/disliked, or want to see more of in the comments below!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK