3

datastation/runner/cmd/dsq at main · multiprocessio/datastation · GitHub

 2 years ago
source link: https://github.com/multiprocessio/datastation/tree/main/runner/cmd/dsq
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

dsq: Run SQL queries against JSON, CSV, Excel, Parquet, and more

Install

Get Go 1.17+ and then run:

$ go install github.com/multiprocessio/datastation/runner/cmd/dsq@latest

Usage

You can either pipe data to dsq or you can pass a file name to it.

When piping data to dsq you need to specify the file extension or MIME type.

For example:

$ cat testdata.csv | dsq csv "SELECT * FROM {} LIMIT 1"
$ cat testdata.parquet | dsq parquet "SELECT COUNT(1) FROM {}"

If you are passing a file, it must have the usual extension for its content type.

For example:

$ dsq testdata.json "SELECT * FROM {} WHERE x > 10"
$ dsq testdata.ndjson "SELECT name, AVG(time) FROM {} GROUP BY name ORDER BY AVG(time) DESC"

Supported Data Types

Name File Extension(s) Notes

CSV csv

JSON json Must be an array of objects. Nested object fields are ignored.

Newline-delimited JSON ndjson, jsonl

Parquet parquet

Excel xlsx, xls Currently only works if there is only one sheet.

Apache Error Logs text/apache2error Currently only works if being piped in.

Apache Access Logs text/apache2access Currently only works if being piped in.

Nginx Access Logs text/nginxaccess Currently only works if being piped in.

Engine

Under the hood dsq uses DataStation as a library and under that hood DataStation uses SQLite to power these kinds of SQL queries on arbitrary (structured) data.

Comparisons

The speed column is based on rough benchmarks based on q's benchmarks. Eventually I'll do a more thorough and public benchmark.

Name Link Speed Supported File Types Engine Maturity

q http://harelba.github.io/q/ Fast CSV, TSV Uses SQLite Mature

textql https://github.com/dinedal/textql Ok CSV, TSV Uses SQLite Mature

octoql https://github.com/cube2222/octosql Slow JSON, CSV, Excel, Parquet Custom engine missing many features from SQLite Mature

dsq Here Ok CSV, JSON, Newline-delimited JSON, Parquet, Excel, Logs Uses SQLite Not mature

License, support, community, whatnot

See the repo's main README.md for the details.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK