Scaling INET queries in POSTGRESQL

A very common requirement in most SOCs is to compare a given IP address against a table to extract contextual information such as originating country, owning organization and so on. At CyberSift we had a similar challenge. Most geo providers will allow you to download data to a table in this format:

CREATE TABLE "ip_loc" (

"start_ip" inet,

"end_ip" inet,

"iso2" varchar(4),

"state" varchar(100),

"city" varchar(100)

);

You then need to query the table for that row where your given IP address is between start_ip and end_ip. The problem is, at last count there are 11,481,758 rows in that table – and that is just for IPv4. I can’t imagine what will happen with IPv6

Your first (correct) thought would be to create an index on start_ip and end_ip, then use something like:

SELECT * FROM ip_loc WHERE inet(host('8.8.8.8')) between start_ip and end_ip LIMIT 1;

ASIDE: the above is one way of doing it… you could use other operators to the same effect.

Close…. but the query is still fantastically slow, taking in the order of a couple seconds to complete. At CyberSift’s scale, this is a problem. Using EXPLAIN on the query shows us why:

           QUERY PLAN                                                 
--------------------------------------
 Limit  (cost=0.00..0.68 rows=1 width=52)
   ->  Seq Scan on ip_loc (cost=0.00..867875.74 rows=1275751 width=52)
[truncated output]

A sequential scan… even though we have an index covering those columns?

Why? Because even if you use an index on those two columns, there is no guarantee that postgres will leverage that index. If too many rows are expected to be returned (NB: not actually returned) by the query planner, then it will use a sequential scan of rows, which we want to avoid at all costs

To solve this, we used one pre-processing step. When loading our data we merge the two columns into a single subnet and store that in another inet column. This is easy to do even after the intial table is created by doing:

ALTER TABLE ip_loc ADD COLUMN subnet inet;

CREATE INDEX ON ip_loc USING gist (subnet inet_ops);

UPDATE ip_loc SET SUBNET = inet_merge(start_ip, end_ip);

Note the creation of an index on the new column, and populating it using the magical inet_merge function

Now your queries can be changed to querying “contained in the subnet” rather than “between two inets”, which from a query analyzer point of view is expected to return a lot fewer rows – so it will use the index we created above, leading to much faster queries, for example:

select * from ip_loc where '92.56.6.3'::inet << subnet;

EXPLAIN shows the change:

We’re back down to a couple milliseconds

References

https://stackoverflow.com/questions/32622907/postgresql-network-address-range-query-optimize/77569189#77569189

Scaling INET queries in POSTGRESQL

Scaling INET queries in POSTGRESQL

References

Related

Recommend

供应链笔记(5)：注塑成本估算

LimeWire AI Studio Generative Art App

Vice President announces first international astronaut to land on the Moon throu...

Updating PE file imports on process start

供应链笔记(7)：什么时候自己投资建厂

Oracle这个公开漏洞正在被8220挖矿组利用

Acquisition Archetypes Seen in the Wild, DevSecOps Edition: Clinging to the Old...

Mozilla ships Firefox 121.0 with Wayland enabled

Tech is Going to Get Much Bigger

The CISO risk calculus: Navigating the thin line between paranoia and vigilance

About Joyk