0

PostgreSQL: Data is Important

 2 years ago
source link: https://developer.squareup.com/blog/postgresql-data-is-important/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Written by Mike Lewis.

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!

PostgreSQL: Data is Important

But Why Postgres?

  • Solves some problems with migrations
  • Full-featured indexes
  • Robust constraints
  • Streaming replication
  • Sophisticated query planner
  • PostGIS
  • Full text search
  • Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

  • wrap migrations in a SQL transaction
  • roll back everything in migration on a partial failure

Indexes

Constraints

  • CHECK constraints
  • foreign key constraints available for all tables
  • exclusion constraints (new in 9.0)

Benefits of validation in the DB

  • reduces dependencies on monolithic app
  • improves consistency guarantees
  • deeper level of safeguards
  • don’t have to worry about prefetching associations
  • just as easy to test

Streaming Replication

Sophisticated Query Planner

  • cost-based (as opposed rule-based) query optimizer
  • heuristics gathered on tables to optimize better
  • costs of random seek, seq scan, etc. configurable
  • e.g making random seeks on an SSD not a bad thing to the optimizer
  • genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table               | type  | possible_keys                        | key                                  | key_len | ref  | rows | Extra                        |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
|  1 | PRIMARY     | NULL                | NULL  | NULL                                 | NULL                                 | NULL    | NULL | NULL | Select tables optimized away |
|  2 | DERIVED     | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5       | NULL |    1 | Using where                  |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.31..8.32 rows=1 width=0)
   ->  GroupAggregate  (cost=0.00..8.30 rows=1 width=12)
         Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
         ->  Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers  (cost=0.00..8.27 rows=1 width=12)
               Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also… if you have a subquery, postgres won’t execute it

PostGIS (for geo queries)

  • extensible GIS system
  • R tree indexes for built-in geometric data types
  • geometric data types
  • free census data (TIGER) for geocoding (addresses→coordinates)
  • Rails integration example from railsonpostgresql.com (site no longer exists).
  • Restaurant.first(:conditions => [“thegeom && ?”, Polygon.fromcoordinates([[[xmin, ymin], [xmin, ymax], …]]], 4269)])

Full Text Search

  • full-featured text indexing
  • dictionaries (languages, stopwords, synonyms)
  • customizable search ranking algorithms
  • mature
  • lighter weight solution than solr/lucene

Advanced SQL Features

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;

  depname  | empno | salary | rank 
-----------+-------+--------+------
 develop   |     8 |   6000 |    1
 develop   |    11 |   5200 |    2
 develop   |     9 |   4500 |    3
 personnel |     2 |   3900 |    1
 personnel |     5 |   3500 |    2
 sales     |     1 |   5000 |    1
 sales     |     4 |   4800 |    2

Mike Lewis (@MikeLewis) | Twitter The latest Tweets from Mike Lewis (@MikeLewis). maker of stuff. kitten enthusiast. iOS hacker/troublemaker at @square…twitter.com


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK