datastation/runner/cmd/dsq at main · multiprocessio/datastation · GitHub
source link: https://github.com/multiprocessio/datastation/tree/main/runner/cmd/dsq
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
dsq: Run SQL queries against JSON, CSV, Excel, Parquet, and more
Install
Get Go 1.17+ and then run:
$ go install github.com/multiprocessio/datastation/runner/cmd/dsq@latest
Usage
You can either pipe data to dsq
or you can pass a file name to it.
When piping data to dsq
you need to specify the file extension or MIME type.
For example:
$ cat testdata.csv | dsq csv "SELECT * FROM {} LIMIT 1"
$ cat testdata.parquet | dsq parquet "SELECT COUNT(1) FROM {}"
If you are passing a file, it must have the usual extension for its content type.
For example:
$ dsq testdata.json "SELECT * FROM {} WHERE x > 10"
$ dsq testdata.ndjson "SELECT name, AVG(time) FROM {} GROUP BY name ORDER BY AVG(time) DESC"
Supported Data Types
Name File Extension(s) Notes
CSV
csv
JSON
json
Must be an array of objects. Nested object fields are ignored.
Newline-delimited JSON
ndjson
, jsonl
Parquet
parquet
Excel
xlsx
, xls
Currently only works if there is only one sheet.
Apache Error Logs
text/apache2error
Currently only works if being piped in.
Apache Access Logs
text/apache2access
Currently only works if being piped in.
Nginx Access Logs
text/nginxaccess
Currently only works if being piped in.
Engine
Under the hood dsq uses DataStation as a library and under that hood DataStation uses SQLite to power these kinds of SQL queries on arbitrary (structured) data.
Comparisons
The speed column is based on rough benchmarks based on q's benchmarks. Eventually I'll do a more thorough and public benchmark.
Name Link Speed Supported File Types Engine Maturity
q http://harelba.github.io/q/ Fast CSV, TSV Uses SQLite Mature
textql https://github.com/dinedal/textql Ok CSV, TSV Uses SQLite Mature
octoql https://github.com/cube2222/octosql Slow JSON, CSV, Excel, Parquet Custom engine missing many features from SQLite Mature
dsq Here Ok CSV, JSON, Newline-delimited JSON, Parquet, Excel, Logs Uses SQLite Not mature
License, support, community, whatnot
See the repo's main README.md for the details.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK