0
PostgreSQL: Data is Important
source link: https://developer.squareup.com/blog/postgresql-data-is-important/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Written by Mike Lewis.
At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!
PostgreSQL: Data is Important
But Why Postgres?
- Solves some problems with migrations
- Full-featured indexes
- Robust constraints
- Streaming replication
- Sophisticated query planner
- PostGIS
- Full text search
- Advanced SQL features
Migration Improvements
No downtime for most DDL migrations. Instantly performs:
- dropping and renaming columns
- adding columns (if not NOT NULL)
- creating indexes (CREATE INDEX CONCURRENTLY)
Transactional DDL statements (not even Oracle has this)
- wrap migrations in a SQL transaction
- roll back everything in migration on a partial failure
Indexes
- indexes on expressions
- CREATE INDEX ON USERS md5(id)
- conditional indexes
- CREATE UNIQUE INDEX ON paymentsources (userid, status) WHERE status = ‘active’;
- inverted indexes
- full-text search, array operations (inclusion tests, etc), ltrees
- very extensible
- tablespaces
- can put indexes on different disk than the tables.
- fully decoupled from table store
Constraints
- CHECK constraints
- foreign key constraints available for all tables
- exclusion constraints (new in 9.0)
Benefits of validation in the DB
- reduces dependencies on monolithic app
- improves consistency guarantees
- deeper level of safeguards
- don’t have to worry about prefetching associations
- just as easy to test
Streaming Replication
- new in 9.0
- streams WALs to standby servers while being generated
- less latency than file-based replication
- more consistent than statement-based replication (MySQL <5.1)
- synchronous capabilities coming in 9.1 guarantees 2-safe durability
Sophisticated Query Planner
- cost-based (as opposed rule-based) query optimizer
- heuristics gathered on tables to optimize better
- costs of random seek, seq scan, etc. configurable
- e.g making random seeks on an SSD not a bad thing to the optimizer
- genetic algorithms to plan complex queries.
Better EXPLAIN
MySQL sample
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | DERIVED | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5 | NULL | 1 | Using where |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
Postgres Sample
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.31..8.32 rows=1 width=0)
-> GroupAggregate (cost=0.00..8.30 rows=1 width=12)
Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
-> Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers (cost=0.00..8.27 rows=1 width=12)
Index Cond: ((user_id >= 69) AND (user_id <= 70))
Also… if you have a subquery, postgres won’t execute it
PostGIS (for geo queries)
- extensible GIS system
- R tree indexes for built-in geometric data types
- geometric data types
- free census data (TIGER) for geocoding (addresses→coordinates)
- Rails integration example from railsonpostgresql.com (site no longer exists).
- Restaurant.first(:conditions => [“thegeom && ?”, Polygon.fromcoordinates([[[xmin, ymin], [xmin, ymax], …]]], 4269)])
Full Text Search
- full-featured text indexing
- dictionaries (languages, stopwords, synonyms)
- customizable search ranking algorithms
- mature
- lighter weight solution than solr/lucene
Advanced SQL Features
- EXCEPT and INTERSECT statements in addition to UNION
- pl/*, custom aggs, UDFs, etc.
- Triggers
- Windowing functions (example from postgresql.org) (for advanced analytics)
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 9 | 4500 | 3
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK