GitHub - multiprocessio/dsq: Commandline tool for running SQL queries against JS...
source link: https://github.com/multiprocessio/dsq
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
This is a CLI companion to DataStation (a GUI) for running SQL queries against data files. So if you want the GUI version of this, check out DataStation.
Install
Binaries for amd64 (x86_64) are provided for each release.
macOS, Linux
On macOS or Linux, you can run the following:
$ VERSION=0.1.0 $ curl -LO "https://github.com/multiprocessio/dsq/releases/download/$VERSION/dsq-$(uname -s | awk '{ print tolower($0) }')-x64-$VERSION.zip" $ unzip dsq*.zip $ sudo mv dsq /usr/local/bin/dsq
Or install manually from the releases
page, unzip and add
dsq
to your $PATH
.
Windows
Download the latest Windows
release, unzip it,
and add dsq
to your $PATH
.
Manual
If you are on another platform or architecture or want to grab the latest release, you can do so with Go 1.17+:
$ go install github.com/multiprocessio/dsq@latest
Usage
You can either pipe data to dsq
or you can pass a file name to it.
If you are passing a file, it must have the usual extension for its content type.
For example:
$ dsq testdata.json "SELECT * FROM {} WHERE x > 10"
$ dsq testdata.ndjson "SELECT name, AVG(time) FROM {} GROUP BY name ORDER BY AVG(time) DESC"
Piping data to dsq
When piping data to dsq
you need to set the -s
flag and specify
the file extension or MIME type.
For example:
$ cat testdata.csv | dsq -s csv "SELECT * FROM {} LIMIT 1"
$ cat testdata.parquet | dsq -s parquet "SELECT COUNT(1) FROM {}"
Multiple files and joins
You can pass multiple files to DSQ. As long as they are supported data
files in a valid format, you can run SQL against all files as
tables. Each table can be accessed by the string {N}
where N
is the
0-based index of the file in the list of files passed on the
commandline.
For example this joins two datasets of differing origin types (CSV and JSON).
$ dsq testdata/join/users.csv testdata/join/ages.json \ "select {0}.name, {1}.age from {0} join {1} on {0}.id = {1}.id"
Transforming data to JSON without querying
As a shorthand for dsq testdata.csv "SELECT * FROM {}"
to convert
supported file types to JSON you can skip the query and the converted
JSON will be dumped to stdout.
For example:
$ dsq testdata.csv [{...some csv data...},{...some csv data...},...]
Supported Data Types
Name File Extension(s) Notes
CSV
csv
TSV
tsv
, tab
JSON
json
Must be an array of objects. Nested object fields are ignored.
Newline-delimited JSON
ndjson
, jsonl
Parquet
parquet
Excel
xlsx
, xls
Currently only works if there is only one sheet.
ODS
ods
Currently only works if there is only one sheet.
Apache Error Logs
text/apache2error
Currently only works if being piped in.
Apache Access Logs
text/apache2access
Currently only works if being piped in.
Nginx Access Logs
text/nginxaccess
Currently only works if being piped in.
Engine
Under the hood dsq uses DataStation as a library and under that hood DataStation uses SQLite to power these kinds of SQL queries on arbitrary (structured) data.
Comparisons
The speed column is based on rough benchmarks based on q's benchmarks. Eventually I'll do a more thorough and public benchmark.
Name Link Speed Supported File Types Engine
q http://harelba.github.io/q/ Fast CSV, TSV Uses SQLite
textql https://github.com/dinedal/textql Ok CSV, TSV Uses SQLite
octoql https://github.com/cube2222/octosql Slow JSON, CSV, Excel, Parquet Custom engine missing many features from SQLite
dsq Here Ok CSV, TSV, JSON, Newline-delimited JSON, Parquet, Excel, ODS (OpenOffice Calc), Logs Uses SQLite
Community
How can I help?
Download the app and use it! Report bugs on Discord.
Before starting on any new feature though, check in on Discord!
Subscribe
If you want to hear about new features and how this works under the hood, sign up here.
License
This software is licensed under an Apache 2.0 license.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK