Database IDs Have No Place in URIs (2008)

I’ve been beta testing Jeff Atwood’s and Joel Spolsky’s latest venture, Stack Overflow . In case you haven’t heard, Stack Overflow is a new site where programmers can go to get their programming questions answered by other programmers. It’s a similar idea to sites like Expert Sexchange [sic], but without the pay wall and general crapness.

I really like what I’ve seen so far. I think Stack Overflow is very well put together and even during the current beta stage it’s pretty slick. The reputation and badges system whereby you gain points and virtual awards for the contributions you make, is fun and strangely addictive. Most importantly though, it’s already genuinely useful. However (and you knew this was coming), one aspect of the implementation of Stack Overflow troubles me somewhat: it appears the site is using numeric database IDs within URI s.

Note that I use URI rather than URL because one of the stated aims for Stack Overflow is that its content be readily indexable by search engines, so that people can enter their specific programming question into Google and hopefully amongst the top results should be the definitive answer on Stack Overflow. It’s known that Google does place importance on URIs, so you want them to be meaningful and immutable.

URIs in Stack Overflow look like this:

http://stackoverflow.com/questions/13204/why-doesnt-my-cron-job-work-properly

—As you can see, there’s a number, followed by a Google-friendly version of the question title, which is often called a “slug”.

It’s generally considered A Bad Idea to expose your database IDs in URIs and here are several reasons why:

If you have to move the site to a different box, can you guarantee that those database IDs will remain the same?
If you have to restore the site’s data from a backup, can you guarantee that those database IDs will remain the same?
If you have to switch to a different database server (say from Microsoft SQL Server to Oracle), can you guarantee that those database IDs will remain the same?

If you answered “no” to any of the questions above, then you’ve broken a fundamental rule of URIs (permalinks), which is that they’re supposed to stay the same forever! Including database IDs exposes implementation detail to the world. It might give me an idea of how many questions or answers there are and I might feel inclined to start hacking around, putting different numbers in to see if anything interesting happens.

URIs should be meaningful, not cluttered with meaningless information. Another example of this is URIs that include pseudo file extensions. For example, .do (Struts) or .aspx (ASP.NET). Why should a site’s visitors care what technology it’s implemented in? Think about what information you’d want your URIs to divulge if you were able to look at them in a hundred year’s time and discard everything else. Keep them meaningful and clean.

I did question the use of database IDs using the Stack Overflow Feedback Forum and got the following official response:

“without the numeric ID, it’d be nearly impossible to map text to database IDs.”

Now unless I’m fundamentally misunderstanding a key part of how Stack Overflow is implemented—which is entirely possible for I have no access to the project team or source code—I find it difficult to understand this statement. Surely all that’s required is an indexed database column that stores the question slug that forms the end of the URI. The slug itself can easily be automatically generated when a new question is saved. Then you can simply retrieve a question by its slug. Using the ActiveRecord dynamic finders in Ruby on Rails it might look like this:

question = Question.find_by_slug params[:id]

—Now you have a fully-formed Question instance and you can go off and get its answers etc. Of course in production code you’d want to make sure that the incoming request parameter is trustworthy rather than passing it straight to the finder method, but that’s not the point of the example. The key point is that the slug in the URI is the key that’s used to get hold of your model object; everything can still be stitched together using numeric database IDs under the hood, just don’t expose that to your site’s visitors. Please don’t pollute your URIs with implementation specifics.

http://stackoverflow.com/questions/13204/why-doesnt-my-cron-job-work-properly

Recommend

Pictures from JAX 2019

火绒安全周报：大量网易邮箱账号遭公开叫卖 D-Link 智能摄像头被攻击 - 知乎

把Palm装进安卓手机！玩转PHEM模拟器 - 知乎

一段万能的Nginx接口实现反向代理配置值得收藏！

Zookeeper和跨数据中心分布式协调 - 知乎

Next Generation Plasma Notifications

GitHub5800星！一键还原百年老电影、黑白旧照片本色

新一代缓存-Caffeine

Can we do better than SQL?

Spunky: A kernel using Ada - Part 1: RPC

About Joyk