7

One-liner for running queries against CSV files with SQLite

 2 years ago
source link: https://til.simonwillison.net/sqlite/one-line-csv-operations
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

One-liner for running queries against CSV files with SQLite

I figured out how to run a SQL query directly against a CSV file using the sqlite3 command-line utility:

sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \
  'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'

This uses the special :memory: filename to open an in-memory database. Then it uses two -cmd options to turn on CSV mode and import the taxi.csv file into a table called taxi. Then it runs the SQL query.

You can get taxi.csv by downloading the compressed file from here and running:

7z e -aos taxi.csv.7z

I figured this out while commenting on this issue.

The output looks like this:

"",128020,32.2371511482553
0,42228,17.0214016766151
1,1533197,17.6418833067999
2,286461,18.0975870711456
3,72852,17.9153958710923
4,25510,18.452774990196
5,50291,17.2709248175672
6,32623,17.6002964166367
7,2,87.17
8,2,95.705
9,1,113.6

Add -cmd '.mode column' to output in columns instead:

$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode column' \
    'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'
passenger_count  COUNT(*)  AVG(total_amount)
---------------  --------  -----------------
                 128020    32.2371511482553 
0                42228     17.0214016766151 
1                1533197   17.6418833067999 
2                286461    18.0975870711456 
3                72852     17.9153958710923 
4                25510     18.452774990196  
5                50291     17.2709248175672 
6                32623     17.6002964166367 
7                2         87.17            
8                2         95.705           
9                1         113.6            

Or use -cmd '.mode markdown' to get a Markdown table:

passenger_count COUNT(*) AVG(total_amount)
128020 32.2371511482553
0 42228 17.0214016766151
1 1533197 17.6418833067999
2 286461 18.0975870711456
3 72852 17.9153958710923
4 25510 18.452774990196
5 50291 17.2709248175672
6 32623 17.6002964166367
7 2 87.17
8 2 95.705
9 1 113.6

A full list of output modes can be seen like this:

% sqlite3 -cmd '.help mode'
.mode MODE ?TABLE?       Set output mode
   MODE is one of:
     ascii     Columns/rows delimited by 0x1F and 0x1E
     box       Tables using unicode box-drawing characters
     csv       Comma-separated values
     column    Output in columns.  (See .width)
     html      HTML <table> code
     insert    SQL insert statements for TABLE
     json      Results in a JSON array
     line      One value per line
     list      Values delimited by "|"
     markdown  Markdown table format
     quote     Escape answers as for SQL
     table     ASCII-art table
     tabs      Tab-separated values
     tcl       TCL list elements

Other options

There are a whole bunch of other tools that can be used for this kind of thing!

My own sqlite-utils memory command can load data from JSON, CSV or TSV into an in-memory database and run a query against it. It's a LOT slower than using sqlite3 directly though.

dsq is a tool that does this kind of thing (and a lot more). Author Phil Eaton compiled a collection of benchmarks of other similar tools, and his benchmarking script demonstrates how to use each one of them.

Created 2022-06-20T16:00:51-07:00, updated 2022-06-20T19:06:35-07:00 · History · Edit


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK