5

GitHub - alash3al/xyr: [WIP] Query any data source using SQL, works with the loc...

 2 years ago
source link: https://github.com/alash3al/xyr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

xyr [WIP]

xyr is a very lightweight, simple, and powerful data ETL platform that helps you to query available data sources using SQL.

Example

here we define a new table called users which will load all json files in that directory (recursive) with any of the following json formats: (object/object[] per-file, newline delimited json objects/object[], or event no delimiter json objects/object[] like what kinesis firehose json output format).

# this file is `./config.xyr.hcl`
table "users" {
    // the driver we want
    driver = "jsondir"

    // the data source directory
    source = "/tmp/data/users"

    // xyr will try to create a table into its internal storage, so it needs
    // to know at least what are the required columns names of your data.
    // i.e: {"id": 1, "email": "[email protected]", "age": 20}
    // but we only need "id" and "email", so we defined both in the below columns array
    // and not that the ordering is the same as our example.
    columns = ["id", "email"]

    // what do you want to load
    // in case of jsondir, we can specify a regex pattern to filter the files 
    // using the filename
    // but if we're using an SQL driver we can provide an sql statement that reads the data
    // from the source SQL based database.
    loader = ".*"
}
$ xyr table:import users

Installation

use this docker package

Supported Drivers

Driver Source jsondir /PATH/TO/JSON/DATA/DIR mysql usrname:password@tcp(server:port)/dbname?option1=value1&... postgres postgresql://username:password@server:port/dbname?option1=value1 sqlite3 /path/to/db.sqlite?option1=value1 sqlserver sqlserver://username:password@host/instance?param1=value&param2=value

sqlserver://username:password@host:port?param1=value&param2=value

sqlserver://sa@localhost/SQLExpress?database=master&connection+timeout=30 hana hdb://user:password@host:port clickhouse tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000 oracle tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000

Use Cases

  • Simple Presto Alternative.
  • Simple AWS Athena Alternative.
  • Convert your JSON documents into a SQL DB.
  • Query your CSV files easily and join them with other data.

How does it work?

internaly xyr utilizes SQLite as an embeded sql datastore (it may be changed in future and we can add multiple data stores), when you define a table in XYRCONFIG file then run $ xyr table:import you will be able to import all defined tables as well querying them via $ xyr exec "SELECT * FROM TABLE_NAME_HERE" which outputs json result by default.

  • Building the initial core.
  • Add the basic import command for importing the tables into xyr.
  • Add the exec command to execute SQL query.
  • Add well known SQL drivers
    • mysql
    • postgres
    • sqlite3
    • clickhouse
    • oracle
    • sqlserver
  • Add an S3 driver
  • Adding/Improving documentations
  • Expose another API beside the CLI to enable external Apps to query xyr.
    • JSON Endpoint?
    • Mysql Protocol?
    • Redis Protocol?
  • Improving the code base (iteration 1).

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK