Why isn’t Postgres using my index?

Why isn’t Postgres using my index? — pgMustard

If you spend any amount of time working with Postgres, it’s likely that at some point you’re going to wonder why it isn’t using an index that you think it should be.

Unlike some databases, you can’t force PostgreSQL to use a specific index, but there are several things you can do to work out what’s going on.

If you’re somewhat new to Postgres, I’m hoping going through these will be interesting, as well as being useful next time you see an issue like this.

I’ll be assuming a basic familiarity with query plans (using EXPLAIN) and Postgres indexing, but honestly not much.

The two main reasons

There are two main categories of reason that result in Postgres not using an index. Either it can’t use the index, or it doesn’t think using the index will be faster.

Working out which of these categories you’re seeing is a great starting point.

Sadly, we can’t tell from a single query plan which case we’re seeing, but through some investigating via multiple query plans, we can work out what’s going on.

Since the reasons that Postgres can’t use an index are more straightforward, and it can be easy to rule out quickly too, I prefer to check that first.

Checking whether Postgres can use the index

There are a few reasons Postgres won’t use a seemingly perfect index for a query.

For example:

A function is preventing its use
A data type mismatch is preventing its use
The index does not exist (oops)

To start with, you can try to disprove you’re in this category of issue by getting a query plan that does use the index.

As I alluded to earlier, some other databases have a “query hints” feature that allows you to (pretty much) force the use of a specific index. But since PostgreSQL does not have such a feature, we can steer clear of that holy war and look at what we can do instead.

For example, Postgres has several parameters that let us temporarily discourage the planner’s choice of certain operations. If you want to encourage an index scan in place of a sequential scan, you can try:

set enable_seqscan = off;

This does not, as the same might suggest, completely disable sequential scans, but rather discourages them by making the planner think they’d be a LOT more expensive than they really would be.

As such, if after setting this to off, this you still get a Seq Scan, there’s a really good chance that Postgres can’t use the index you want, and you’re seeing this category of issue. If, however, it does now use the index you’d hoped, you can skip to the other reasons.

Similarly, if Postgres is picking a different index instead of the index you think would be optimal, then this trick to make indexes invisible shared by Haki Benita (to drop an index and then rollback, all inside a transaction) is a nice way to test this, in a non-production environment of course. Again, whether the query plan changes or not gives you a very good signal as to which category of problem you have.

If Postgres can’t use the index

If you think you’re in the boat of Postgres not being able to use the index, it is first worth checking that the index you’re expecting Postgres to use actually exists (in the environment you’re testing in). If it doesn’t, please believe me that you are not the first, and won’t be the last, to be in this situation! And at least you now have a simple solution.

Assuming the index does exist, though, the next step is to check its definition. This will be helpful for the next two checks we’ll do.

Functions

If your query has a function on the data involved, your index will likely need to match it to get used. For example, a query like:

select * from t where lower(email) = ‘[email protected]’;

Won’t use a simple index on “email”, even if in theory it could.

To quickly test if this is the issue you’re seeing, you can request the query plan for the query without the function, for example:

explain select * from t where email = ‘[email protected]’;

If you see it is now using your index, and this is how you commonly query that column, you may wish to add a functional index. For example, in this case:

create index i on t (lower(email));

Similarly, you might see this when doing arithmetic on a column. For example, the following query would require an index on “(column + 1)”:

select * from t where column + 1 = 10;

Whereas the equivalent query below would be able to use an index on “column”:

select * from t where column = 10 - 1;

In cases like this, if you are able to change the query, that is normally the better solution.

Datatype mismatches

Earlier, I also mentioned datatype mismatches. You might spot these in the query plan through datatype casting (the :: characters). Postgres can handle some of these without issue, eg varchar to text, but some casts will result in an index not being used.

Here is a contrived example:

explain select * from t where id = 100::numeric;

The explicit cast prevents use of my index, resulting in this query plan:

 Seq Scan on t  (cost=0.00..2041.00 rows=500 width=10)
   Filter: ((id)::numeric = '100'::numeric)

To test whether this is the issue, you can try explicitly casting to the column’s datatype in your query, or avoiding casting being added by the application (eg via an ORM).

If Postgres can use the index, but doesn’t think it will be faster

If you’ve determined that Postgres can use the index, but is choosing not to, then this next section is for you.

There are usually several ways Postgres can get the results of a given query. As such, before executing a query, the planner will estimate the “cost” of different options, and pick the plan with the lowest cost in the hope that it will be the fastest way of executing that query. It’s important to note that these costs are in an arbitrary unit, where higher numbers are a predictor of higher execution times, so it’s the relative numbers that matter, not the absolutes.

As such, if Postgres chooses not to use your index, it is likely that it is calculating the cost of that query plan to be higher than the one it has chosen.

We can verify this by using the tricks mentioned in the previous section, like enable_seqscan = false, or hiding other indexes, and comparing the estimated costs of the different query plans.

The next step is to check whether the execution time of the plan that uses the index is actually faster, or not. You may wish to run it a few times to warm up the cache. If it isn’t faster, you are looking at a case where the planner has correctly chosen not to use your index.

At this point, it is worth noting that the Postgres planner will try to optimize for execution time, rather than efficiency (eg blocks of data read). Although these usually go hand in hand, it is a subtle distinction worth bearing in mind.

There are a few (normally good) reasons for Postgres choosing a sequential scan even when it could use an index scan:

If the table is small
If a large proportion of the rows are being returned
If there is a LIMIT clause and it thinks it can abort early

If none of these are the case, you may wish to skip ahead to the cost estimation section.

If the table is small

If a table is small (very roughly 100 rows or fewer), Postgres may estimate that it will be faster to read the table sequentially and filter out rows as needed, even for highly selective queries.

For example, in pgMustard we show scans on small tables as an anti-tip (scoring 0.0 out of 5.0 for index potential):

The two main reasons

Checking whether Postgres can use the index

If Postgres can’t use the index

Functions

Datatype mismatches

If Postgres can use the index, but doesn’t think it will be faster

If the table is small

Recommend

Why isn't 1 a prime number?

Why Your Dockerized Application Isn’t Receiving Signals

Why Isn’t X a Hook?

Why IBM’s Blockchain Isn’t a Real Blockchain

Fast Fulltext Search With Postgres Gin Index

Postgres GIN Index in Rails

How Postgres chooses which index to use for a query

查 Postgres 的 index 使用情况

為什麼 Postgres 不選擇 index scan 卻選擇 seq scan？

Napkin Problem 21: Index Merges vs Composite Indexes in Postgres and MySQL

About Joyk