1

HackerNews Ranking Algorithm

 1 year ago
source link: https://vigneshwarar.substack.com/p/hackernews-ranking-algorithm-how
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

HackerNews Ranking Algorithm

Here's a small thought experiment that explores an alternative approach to ranking posts on HackerNews.

One of my favorite sites to visit daily is HackerNews(HN), it never failed me to deliver good quality links and very interesting discussions.

This is just a small thought experiment for a different ranking algorithm for HN. There's nothing wrong with the current one, but it's just a small thought experiment.

Just to refresh, according to Paul Graham's comment and FAQ, the HN algorithm is...

rank=P−1(T+2)G

P = Points

T = Age in hours

G = 1.8

Minus one removes the user's own upvote. By default, HN upvotes its own user submission.

I’ve seen one with penalties but let’s stick to what’s official. Also, I just did a quick Grep search and found this article and wonderful repo about “Reverse Engineering the Hacker News Ranking Algorithm

Intuitively, the ranking algorithm is simple: the more upvotes a link receives in a short amount of time, the higher it will be ranked at the top. As time passes, the post will gradually move down the rankings.

Here is my approach

I can't find the official page on why Paul Graham created HN, but to me, HN is the place to find the most interesting links and have a healthy discussion about various topics. In fact, discussions are my favorite part, except for the useless comments.

Given that discussions are my favorite part, why don't we apply PageRank for every user based on the upvotes they receive for the comments they leave on any post, and replace the P (points) value in the current version of the HN algorithm?

PageRank is an algorithm created by the founders of Google. It is used to determine the popularity of web pages on the internet.

By the way, the name PageRank did not suit this use case. I am going to call it HackerRank (HR). Here is a visualization if you are trying to picture it.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11d75d52-c322-430a-b382-636457c898b3_2208x1984.png

Since it's likely that one user may upvote multiple comments from the same user, we check whether a user has already upvoted a comment from that specific user before considering their upvote. In other words, we treat user profiles as nodes and upvotes for comments as edges.

Considering this, a HackerRank user profile will look like this:

HRi=1−DN+D∑j∈{1,…,n}HRjTUj

N = Total number of user profiles on HN

D = Damping factor

HR = HackerRank score for the user who upvoted

TU = Total upvotes given

By default, the HR for every profile will be 1/N.

But I won't consider HackerRank as it is. Sometimes, HN comments are inappropriate, and they will be flagged. We should consider the flagging because, remember, HN should be a place to have healthy discussions.

Let's consider that 1 flag equals a deduction of 20% from the HR score. However, we will only take into account the flags received in the current month, as people can change from being unpleasant to becoming better human beings.

So, HR with “flag” consideration will look like this.

HRwithflag=HR−((TF∗FP)∗HR)
HR=max(0,HRwithflag)

TF = Total flags received in the current month

FP = Flag penalty which is 20%

If HR is negative, then it will be 0.

We now have HR for every user profile, so the final ranking algorithm will look like this by hooking HR into the current version of the HN algorithm.

ranki=∑j∈{1,…,n}(HRj)(T+2)G

HR = HR value represents the HackerRank score of the profile that upvoted this post

T = Age in hours

G = 1.8

This is a very simple approach, but here are some other ideas that are worth exploring.

  • HackerRank score, which is also determined by the upvotes a user receives for their submissions.

  • Reading time for the article.

  • Track how well a website is performing on HN and put it on the front page if the website has a high reputation for performing well.

Will I ever publicly write about how HN ranks posts if I am Dang (HN moderator)? No, because Pagerank can be manipulated by people despite its reputation. In fact, Pagerank is being exploited for years. Moreover, there are financial incentives for companies to get on the first page in HN.

I would use HackerRank for ranking posts but publicly say that we are using original PaulGraham’s algorithm for ranking posts, and hide the upvotes for comments since it is powering HackerRank, also take some additional steps to avoid reverse engineering and rank manipulation.

But, I am curious. How would you have done it? If you were designing the HN algorithm, please leave your thoughts in the comments. I am curious to know.

Discussion on HN → https://news.ycombinator.com/item?id=35510413

Plug: Hey, we are building a new kind of search engine. Our goal is to deliver authoritative and non-SEO-spammed results. Please check it out and let me know your feedback.

Thanks for reading Vigneshwarar’s Newsletter! Subscribe for free to receive new posts and support my work.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK