4

sqlite-utils: a nice way to import data into SQLite for analysis

 2 years ago
source link: https://jvns.ca/blog/2022/05/12/sqlite-utils--a-nice-way-to-import-data-into-sqlite/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

sqlite-utils: a nice way to import data into SQLite for analysis



Hello! This is a quick post about a nice tool I found recently called sqlite-utils, from the tools category.

Recently I wanted to do some basic data analysis using data from my Shopify store. So I figured I’d query the Shopify API and import my data into SQLite, and then I could make queries to get the graphs I want.

But this seemed like a lot of boring work, like I’d have to write a schema and write a Python program. So I hunted around for a solution, and I found sqlite-utils, a tool designed to make it easy to import arbitrary data into SQLite to do data analysis on the data.

sqlite-utils automatically generates a schema

The Shopify data has about a billion fields and I really did not want to type out a schema for it. sqlite-utils solves this problem: if I have an array of JSON orders, I can create a new SQLite table with that data in it like this:

import sqlite_utils

orders = ... # (some code to get the `orders` array here)

db = sqlite_utils.Database('orders.db')
db['shopify_orders'].insert_all(orders)

you can alter the schema if there are new fields (with alter)

Next, I ran into a problem where on the 5th page of downloads, the JSON contained a new field that I hadn’t seen before.

Luckily, sqlite-utils thought of that: there’s an alter flag which will update the table’s schema to include the new fields. ```

Here’s what the code for that looks like

db['shopify_orders'].insert_all(orders, alter=True)

you can deduplicate existing rows (with upsert)

Next I ran into a problem where sometimes when doing a sync, I’d download data from the API where some of it was new and some wasn’t.

So I wanted to do an “upsert” where it only created new rows if the item didn’t already exist. sqlite-utils also thought of this, and there’s an upsert method.

For this to work you have to specify the primary key. For me that was pk="id". Here’s what my final code looks like:

db['shopify_orders'].upsert_all(
    orders,
    pk="id",
    alter=True
)

there’s also a command line tool

I’ve talked about using sqlite-utils as a library so far, but there’s also a command line tool which is really useful.

For example, this inserts the data from a plants.csv into a plants table:

sqlite-utils insert plants.db plants plants.csv --csv

format conversions

I haven’t tried this yet, but here’s a cool example from the help docs of how you can do format conversions, like converting a string to a float:

sqlite-utils insert plants.db plants plants.csv --csv --convert '
return {
  "name": row["name"].upper(),
  "latitude": float(row["latitude"]),
  "longitude": float(row["longitude"]),
}'

This seems really useful for CSVs, where by default it’ll often interpret numeric data as strings if you don’t do this conversions.

metabase seems nice too

Once I had all the data in SQLite, I needed a way to draw graphs with it. I wanted some dashboards, so I ended up using Metabase, an open source business intelligence tool. I found it very straightforward and it seems like a really easy way to turn SQL queries into graphs.

This whole setup (sqlite-utils + metabase + SQL) feels a lot easier to use than my previous setup, where I had a custom Flask website that used plotly and pandas to draw graphs.

that’s all!

I was really delighted by sqlite-utils, it was super easy to use and it did everything I wanted.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK