One-liner for running queries against CSV files with SQLite
source link: https://til.simonwillison.net/sqlite/one-line-csv-operations
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
One-liner for running queries against CSV files with SQLite
I figured out how to run a SQL query directly against a CSV file using the sqlite3
command-line utility:
sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \
'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'
This uses the special :memory:
filename to open an in-memory database. Then it uses two -cmd
options to turn on CSV mode and import the taxi.csv
file into a table called taxi
. Then it runs the SQL query.
You can get taxi.csv
by downloading the compressed file from here and running:
7z e -aos taxi.csv.7z
I figured this out while commenting on this issue.
The output looks like this:
"",128020,32.2371511482553
0,42228,17.0214016766151
1,1533197,17.6418833067999
2,286461,18.0975870711456
3,72852,17.9153958710923
4,25510,18.452774990196
5,50291,17.2709248175672
6,32623,17.6002964166367
7,2,87.17
8,2,95.705
9,1,113.6
Add -cmd '.mode column'
to output in columns instead:
$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode column' \
'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'
passenger_count COUNT(*) AVG(total_amount)
--------------- -------- -----------------
128020 32.2371511482553
0 42228 17.0214016766151
1 1533197 17.6418833067999
2 286461 18.0975870711456
3 72852 17.9153958710923
4 25510 18.452774990196
5 50291 17.2709248175672
6 32623 17.6002964166367
7 2 87.17
8 2 95.705
9 1 113.6
Or use -cmd '.mode markdown'
to get a Markdown table:
passenger_count | COUNT(*) | AVG(total_amount) |
---|---|---|
128020 | 32.2371511482553 | |
0 | 42228 | 17.0214016766151 |
1 | 1533197 | 17.6418833067999 |
2 | 286461 | 18.0975870711456 |
3 | 72852 | 17.9153958710923 |
4 | 25510 | 18.452774990196 |
5 | 50291 | 17.2709248175672 |
6 | 32623 | 17.6002964166367 |
7 | 2 | 87.17 |
8 | 2 | 95.705 |
9 | 1 | 113.6 |
A full list of output modes can be seen like this:
% sqlite3 -cmd '.help mode'
.mode MODE ?TABLE? Set output mode
MODE is one of:
ascii Columns/rows delimited by 0x1F and 0x1E
box Tables using unicode box-drawing characters
csv Comma-separated values
column Output in columns. (See .width)
html HTML <table> code
insert SQL insert statements for TABLE
json Results in a JSON array
line One value per line
list Values delimited by "|"
markdown Markdown table format
quote Escape answers as for SQL
table ASCII-art table
tabs Tab-separated values
tcl TCL list elements
Other options
There are a whole bunch of other tools that can be used for this kind of thing!
My own sqlite-utils memory command can load data from JSON, CSV or TSV into an in-memory database and run a query against it. It's a LOT slower than using sqlite3
directly though.
dsq is a tool that does this kind of thing (and a lot more). Author Phil Eaton compiled a collection of benchmarks of other similar tools, and his benchmarking script demonstrates how to use each one of them.
Created 2022-06-20T16:00:51-07:00, updated 2022-06-20T19:06:35-07:00 · History · Edit
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK