14

Why isn’t Postgres using my index?

 2 years ago
source link: https://www.pgmustard.com/blog/why-isnt-postgres-using-my-index
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Why isn’t Postgres using my index? — pgMustard

Cover photo: Emily Morter

If you spend any amount of time working with Postgres, it’s likely that at some point you’re going to wonder why it isn’t using an index that you think it should be.

Unlike some databases, you can’t force PostgreSQL to use a specific index, but there are several things you can do to work out what’s going on.

If you’re somewhat new to Postgres, I’m hoping going through these will be interesting, as well as being useful next time you see an issue like this.

I’ll be assuming a basic familiarity with query plans (using EXPLAIN) and Postgres indexing, but honestly not much.

The two main reasons

There are two main categories of reason that result in Postgres not using an index. Either it can’t use the index, or it doesn’t think using the index will be faster.

Working out which of these categories you’re seeing is a great starting point.

Sadly, we can’t tell from a single query plan which case we’re seeing, but through some investigating via multiple query plans, we can work out what’s going on.

Since the reasons that Postgres can’t use an index are more straightforward, and it can be easy to rule out quickly too, I prefer to check that first.

Checking whether Postgres can use the index

There are a few reasons Postgres won’t use a seemingly perfect index for a query.

For example:

  • A function is preventing its use 

  • A data type mismatch is preventing its use

  • The index does not exist (oops)

To start with, you can try to disprove you’re in this category of issue by getting a query plan that does use the index.

As I alluded to earlier, some other databases have a “query hints” feature that allows you to (pretty much) force the use of a specific index. But since PostgreSQL does not have such a feature, we can steer clear of that holy war and look at what we can do instead.

For example, Postgres has several parameters that let us temporarily discourage the planner’s choice of certain operations. If you want to encourage an index scan in place of a sequential scan, you can try:

set enable_seqscan = off;

This does not, as the same might suggest, completely disable sequential scans, but rather discourages them by making the planner think they’d be a LOT more expensive than they really would be.

As such, if after setting this to off, this you still get a Seq Scan, there’s a really good chance that Postgres can’t use the index you want, and you’re seeing this category of issue. If, however, it does now use the index you’d hoped, you can skip to the other reasons.

Similarly, if Postgres is picking a different index instead of the index you think would be optimal, then this trick to make indexes invisible shared by Haki Benita (to drop an index and then rollback, all inside a transaction) is a nice way to test this, in a non-production environment of course. Again, whether the query plan changes or not gives you a very good signal as to which category of problem you have.

If Postgres can’t use the index

If you think you’re in the boat of Postgres not being able to use the index, it is first worth checking that the index you’re expecting Postgres to use actually exists (in the environment you’re testing in). If it doesn’t, please believe me that you are not the first, and won’t be the last, to be in this situation! And at least you now have a simple solution.

Assuming the index does exist, though, the next step is to check its definition. This will be helpful for the next two checks we’ll do.

Functions

If your query has a function on the data involved, your index will likely need to match it to get used. For example, a query like:

select * from t where lower(email) = ‘[email protected]’;

Won’t use a simple index on “email”, even if in theory it could.

To quickly test if this is the issue you’re seeing, you can request the query plan for the query without the function, for example:

explain select * from t where email = ‘[email protected]’;

If you see it is now using your index, and this is how you commonly query that column, you may wish to add a functional index. For example, in this case:

create index i on t (lower(email));

Similarly, you might see this when doing arithmetic on a column. For example, the following query would require an index on “(column + 1)”:

select * from t where column + 1 = 10;

Whereas the equivalent query below would be able to use an index on “column”:

select * from t where column = 10 - 1;

In cases like this, if you are able to change the query, that is normally the better solution.

Datatype mismatches

Earlier, I also mentioned datatype mismatches. You might spot these in the query plan through datatype casting (the :: characters). Postgres can handle some of these without issue, eg varchar to text, but some casts will result in an index not being used.

Here is a contrived example:

explain select * from t where id = 100::numeric;

The explicit cast prevents use of my index, resulting in this query plan:

 Seq Scan on t  (cost=0.00..2041.00 rows=500 width=10)
   Filter: ((id)::numeric = '100'::numeric)

To test whether this is the issue, you can try explicitly casting to the column’s datatype in your query, or avoiding casting being added by the application (eg via an ORM).

If Postgres can use the index, but doesn’t think it will be faster

If you’ve determined that Postgres can use the index, but is choosing not to, then this next section is for you.

There are usually several ways Postgres can get the results of a given query. As such, before executing a query, the planner will estimate the “cost” of different options, and pick the plan with the lowest cost in the hope that it will be the fastest way of executing that query. It’s important to note that these costs are in an arbitrary unit, where higher numbers are a predictor of higher execution times, so it’s the relative numbers that matter, not the absolutes. 

As such, if Postgres chooses not to use your index, it is likely that it is calculating the cost of that query plan to be higher than the one it has chosen.

We can verify this by using the tricks mentioned in the previous section, like enable_seqscan = false, or hiding other indexes, and comparing the estimated costs of the different query plans.

The next step is to check whether the execution time of the plan that uses the index is actually faster, or not. You may wish to run it a few times to warm up the cache. If it isn’t faster, you are looking at a case where the planner has correctly chosen not to use your index.

At this point, it is worth noting that the Postgres planner will try to optimize for execution time, rather than efficiency (eg blocks of data read). Although these usually go hand in hand, it is a subtle distinction worth bearing in mind.

There are a few (normally good) reasons for Postgres choosing a sequential scan even when it could use an index scan:

  1. If the table is small

  2. If a large proportion of the rows are being returned

  3. If there is a LIMIT clause and it thinks it can abort early

If none of these are the case, you may wish to skip ahead to the cost estimation section.

If the table is small

If a table is small (very roughly 100 rows or fewer), Postgres may estimate that it will be faster to read the table sequentially and filter out rows as needed, even for highly selective queries.

For example, in pgMustard we show scans on small tables as an anti-tip (scoring 0.0 out of 5.0 for index potential):


Recommend

  • 54
    • www.tuicool.com 5 years ago
    • Cache

    Why isn't 1 a prime number?

    An engineer friend of mine recently surprised me by saying he wasn’t sure whether the number 1 was prime or not. I was surprised because among mathematicians, 1 is universally regarded as non-prime. The confusion beg...

  • 6

    Why Your Dockerized Application Isn’t Receiving Signals19 June 2017Proper cleanup when terminating your application isn’t less important when it’s running inside of a Docker container. Although it only comes down to making...

  • 4
    • overreacted.io 3 years ago
    • Cache

    Why Isn’t X a Hook?

    Why Isn’t X a Hook?January 26, 2019 • ☕️☕️ 8 min readSince the first alpha version of React Hooks was released, there is a question that keeps coming up in discussions: “Why isn’t &l...

  • 14

    Why IBM’s Blockchain Isn’t a Real Blockchain

  • 12

    The first thing we need to do is to create a lot of fake data so we can test our indexex, it's also important to use real words dependending on the index that you're trying to use, there's a clever way of doing this on linux, if you're...

  • 4
    • blog.kiprosh.com 2 years ago
    • Cache

    Postgres GIN Index in Rails

    rails Published on  21 December 2021...

  • 7

    How Postgres Chooses Which Index To Use For A QueryBy Lukas FittlApril 01, 2022

  • 11
    • blog.niclin.tw 2 years ago
    • Cache

    查 Postgres 的 index 使用情况

    Nic Lin's Blog喜歡在地上滾的工程師一般來說,打了 index 都會想看成效,使用的情況如何,在 postgreSQL 內可以直接下這行指令SELECT * FROM pg_stat_user_index...

  • 6

    最近在分析影響效能的 Query,發現 PostgreSQL 有時的查詢效能不如我們預期,用了 EXPLAIN 下去分析索引,發現確實新增的 index 並沒有在 query plain 裡面,我想瞭解為什麼。評估結果是 seq scan 更快在數據量很小的時候, seq scan 會比 inde...

  • 7

    Index Merges vs Composite Indexes in Postgres and MySQLNov 2022Composite indexes are about 10x faster than index merges. In Postgres, the gap is larger than in MySQL because Postgres doesn't s...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK