Digging into Postgresql and DEV
source link: https://dev.to/devteam/digging-into-postgresql-and-dev-3e43
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Digging into Postgresql and DEV
Earlier today I was reviewing the draft Community Wellness badge pull request; and with my head deep in SQL these days, I thought I’d give a go at crafting a query to create this logic.
The following query finds the user IDs and weeks since today in which a user has written at least two comments that don’t have a negative moderator reaction.
-
user_id
: The user’s database ID. -
weeks_ago
: The number of weeks (since today) in which we’re grouping comments -
number_of_comments_with_positive_reaction
: How many positive reaction comments did they have for theweeks_ago
.
SELECT user_id,
COUNT(user_id) as number_of_comments_with_positive_reaction,
/* Get the number of weeks, since today for posts */
(trunc((extract(epoch FROM (current_timestamp- created_at))) / 604800)) AS weeks_ago
FROM comments
/* Only select comments from the last 32 weeks that
don't have a negative moderator reaction */
INNER JOIN
/* Find all comments in the last 32 weeks */
(SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
/* Omit any comments that got a negative moderator reaction */
EXCEPT
SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
AND category IN ('thumbsdown', 'vomit')) AS positve_reactions
ON comments.id = positve_reactions.reactable_id
INNER JOIN
/* Find the users who have at least two comments in the last week */
(SELECT count(id) AS number_of_comments,
user_id AS comment_counts_user_id
FROM comments
WHERE created_at >= (now() - interval '7' day)
GROUP BY user_id) AS comment_counts
ON comments.user_id = comment_counts_user_id
AND comment_counts.number_of_comments > 1
/* Don’t select anything older than 224 days ago, or 32 weeks ago */
WHERE created_at > (now() - interval '224' day)
GROUP BY user_id, weeks_ago
Enter fullscreen mode
Exit fullscreen mode
The above query creates multiple rows per user id. Which is fine, but if you want to loop through things, you’ll need to bust out some temporary variable magic.
I was wondering if I'd be able to get this down to one query. With the help of some folks at Forem, I wrote the following query aggregates that information for you; but you need to do some assembly work.
The columns are:
-
user_id
: The user’s database ID. -
serialized_weeks_ago
: A comma separated list of the weeks in which we had comments. -
weeks_ago_array
: An array of integers that is the non-string representation ofserialized_weeks_ago
; we want to see how ActiveRecord handles this array of integers. It's a the simpler version of theserialized_weeks_ago
. -
serialized_comment_counts
: A comma separated list of the number of comments.
The first number of the serialized_weeks_ago
maps to the first number of the serialized_comment_counts
. And you get one row per user.
SELECT user_id,
/* A comma separated string of "weeks_ago" */
array_to_string(array_agg(weeks_ago), ',') AS serialized_weeks_ago,
/* Will active record convert this to an array of integers? */
array_agg(weeks_ago) AS weeks_ago_array,
/* A comma separated string of comment counts. The first value in this string
happens on the week that is the first value in serialized_weeks_ago */
array_to_string(array_agg(number_of_comments_with_positive_reaction), ',') AS serialized_comment_counts
FROM
/* This is the same query as the first example query */
(SELECT user_id,
COUNT(user_id) as number_of_comments_with_positive_reaction,
/* Get the number of weeks, since today for posts */
(trunc((extract(epoch FROM (current_timestamp- created_at))) / 604800)) AS weeks_ago
FROM comments
/* Only select comments from the last 32 weeks that
don't have a negative moderator reaction */
INNER JOIN
/* Find all comments in the last 32 weeks */
(SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
/* Omit any comments that got a negative moderator reaction */
EXCEPT
SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
AND category IN ('thumbsdown', 'vomit')) AS positve_reactions
ON comments.id = positve_reactions.reactable_id
INNER JOIN
/* Find the users who have at least two comments in the last week */
(SELECT count(id) AS number_of_comments,
user_id AS comment_counts_user_id
FROM comments
WHERE created_at >= (now() - interval '7' day)
GROUP BY user_id) AS comment_counts
ON comments.user_id = comment_counts_user_id
AND comment_counts.number_of_comments > 1
/* Don’t select anything older than 224 days ago, or 32 weeks ago */
WHERE created_at > (now() - interval '224' day)
GROUP BY user_id, weeks_ago
) AS user_comment_counts_by_week GROUP BY user_id
Enter fullscreen mode
Exit fullscreen mode
I am eager to share these Postgresql approaches as they can help circumvent running lots of smaller queries. I also had the chance to pair up with two folks to make sure we wrote the correct logic and it was performant enough.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK