4

Linux Fu: Databases Are Next-Level File Systems

 3 years ago
source link: https://hackaday.com/2021/06/08/linux-fu-databases-are-next-level-file-systems/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Linux Fu: Databases Are Next-Level File Systems

It is funny how exotic computer technology eventually either fails or becomes commonplace. At one time, having more than one user on a computer at once was high tech, for example. Then there are things that didn’t catch on widely like vector display or content-addressable memory. The use of mass storage — especially disk drives — in computers, though has become very widespread. But at one time it was an exotic technique and wasn’t nearly as simple as it is today.

However, I’m surprised that the filesystem as we know it hasn’t changed much over the years. Sure, compared to, say, the 1960s we have a lot better functionality. And we have lots of improvements surrounding speed, encoding, encryption, compression, and so on. But the fundamental nature of how we store and access files in computer programs is stagnant. But it doesn’t have to be. We know of better ways to organize data, but for some reason, most of us don’t use them in our programs. Turns out, though, it is reasonably simple and I’m going to show you how with a toy application that might be the start of a database for the electronic components in my lab.

You could store a database like this in a comma-delimited file or using something like JSON. But I’m going to use a full-featured SQLite database to avoid having a heavy-weight database server and all the pain that entails. Is it going to replace the database behind the airline reservation system? No. But will it work for most of what you are likely to do? You bet.

Abstraction

If you think about it, the file system is nothing more than an abstraction over the disk drive. We normally don’t know or care where exactly hello.c is stored. We don’t even care if it is encrypted or compressed. It could be fetched over a network or all the pieces of it could be scattered randomly across the disk. We don’t usually care. What if you abstracted the filesystem itself?

That’s pretty much the idea of a database. If I have a list of, say, electronic components, I could store them in a comma-delimited file and read it with a spreadsheet. Or I could use a full-up database. The problem with databases is that traditionally it requires some server software like MySQL, SQLServer, or Oracle, for example. You can abstract the database interface, but it is a pretty heavy solution compared to just opening a file and using it normally.

However, there is a frequently used library called SQLite that provides a pretty robust database that can live in a single file with no external server or maintenance. There are limitations, of course, but for a lot of simple programs it can bring the benefits of a database without the overhead and expense.

The Right Tool for the Right Job

Of course, there are limitations. However, if you are rolling your own file format for something, you might want to consider switching to SQLite and handling it as a database. According to the project’s website, doing so may actually save space and increase access speed. Plus, once you get the hang of it, it is just easier. It is also easier to scale later if you decide to switch to a real database.

sqlb.png?w=400

If you are storing huge databases (like terabyte-scale) or you need many concurrent users — especially writing to the database — this might not be for you. The SQLite website has a good page about what uses are good and which are not optimal for the library.

One other advantage: There is a command line program (and some GUI variations like the browser in the accompanying image) that let you work with SQLite databases without writing any code. So you can do things like populate your data or examine your database without having to write SQL at all. For a custom file format, you’d probably have to do everything yourself or populate and debug data with a generic tool that doesn’t know about your specific data.

My Task

I don’t want to develop an entire application in a post, nor do I want to teach SQL — the structured query language that most databases include SQLite use. But I want to show you how easy it is to get a start on a simple electronics database using C. The C code will turn out to be the least of our problems. The two things you’ll want to understand most are how to structure the data — the database schema — and how to populate the initial data. Even if you want to have your program add data eventually, it is nice to start with a little data initially to get your program working.

Database Fundamentals

A modern relational database has one or more tables. Each table has rows of data. A row has one or more columns and each column has a data type. For example, you might have a text column for serial number, a real number value for test point voltage, and a boolean for pass/fail.

Each table has some unique ID per row. The database will provide one for you if you don’t, but usually, you’ll want to provide this unique ID yourself. The database will help you by automatically incrementing the number and ensuring it is unique for each row.

If this is all there was to it, there wouldn’t be many advantages over a comma-delimited file. But we can do a lot of things better once we have this organizational structure. For example, it is easy to ask the database to sort items or pick the highest three voltages out of the table.

However, one of the biggest advantages of a database is being able to do joins. Supposed I have a list of components: a PC board, a resistor, a battery holder, and an LED. I have a table that has one row corresponding to each of them. Now suppose I want to have a table of assemblies that are made up of components.

I could take a simple approach:

Table Component
ID    Name
===========
1     PCB
2     Resistor
3     LED
4     Battery Holder

Table Assembly
ID    Name       Components
============================
1     Blink1     PCB, Resistor, LED, Battery Holder
2     Blink2     PCB, Resistor, LED, Resistor, LED, Battery Holder



That's ugly and wasteful. A better approach would be to use three tables:

Table Component
ID Name
===========
1 PCB
2 Resistor
3 LED
4 Battery Holder

Table Assembly
ID Name 
=========
1 Blink1 
2 Blink2 

Table Assembly_Parts
ID    Component    Quan
=======================
1     1            1
1     2            1
1     3            1
1     4            1
2     1            1
2     2            2
2     3            2
2     4            1

Using a join operation, you can bind these tables together to generate what amounts to the first table without duplicating lots of data.

For my toy database, then, I’m going to create three tables: part will contain the parts I have. The partnums table will hold types of parts (e.g. a 7805 vs a 2N2222 or a CDP1802. Finally, a locations table will tell me where I store things. There are other ways this could be structured. For example, there could be a table to store types of footprints: a 2N2222 can be in a TO92 or a surface mount,. In addition, I’m going to create a view that shows everything unrolled like in the first example. A view is something that isn’t stored but acts like a table for convenience. In reality, it is just a query on the database that you can work with.

There’s a lot more to it, of course. There are inner and outer joins and a lot of other details and nuances. Luckily, there’s plenty of material to read about databases on the Web including the SQLite documentation.

Just Enough SQL

For our purposes, we are only going to use a handful of SQL statements: create, insert, and select. There is an executable, sqlite3, where you can enter database commands. You can provide the name of the database on the command line and that’s the easiest way to go. Use .exit when you want to exit.

You can probably figure out the SQL syntax since is pretty verbose:

create table part ( id integer not null primary key, name text, partnum integer, value text, 
   units text, quantity integer, photo blob, data text, location integer, footprint text);
create table partnums (id integer not null primary key, partnum text, desc text);

create table locations (id integer not null primary key, location text, desc text);

create view full as select part.id, name, partnums.partnum as part_number, value, units, 
   quantity, data, locations.location as location, footprint from part 
   inner join partnums on part.partnum = partnums.id inner join locations on locations.id=part.location

I just made those calls in the sqlite3 command line program although I could have used the GUI or — if I wanted to — I could make my C program execute those commands.  I also used the command line to insert a few test records. For example:

insert into locations (location,desc) values ("Shop - storage II","Storage over computer desk in shop");
insert into partnums(partnum,desc) values("R.25W","Quarter Watt Resistor");
insert into part(partnum,quantity,location,value,units) values (2,111,1,"10K","ohms");

To get data back out, you’ll use the select command:

select * from part;

select partnum, quantity from part where quantity<5;

If you want to know more, there are plenty of SQL tutorials on the web.

Programming!

So far, none of this has required programming. Assuming you have the libsqlite3-dev package or its equivalent, you don’t need much to add database functions to your C program. You’ll need to include sqlite3.h. If you can’t find it, you probably don’t have the development files installed. You’ll also need to link with libsqlite3. For a simple single file project, this makefile will probably get you started:

CC=gcc
CFLAGS+=-std=c99 -g
LDFLAGS=-g
LDLIBS+=-lsqlite3

edatabase : main

main : main.c

The code itself is straightforward. You need to open the database file (sqllite3_open). Instead of a file, you can pass “:memory” to get an in-memory database that won’t last beyond the life of your program. The call will give you a handle back to your database. Next, you have to parse or prepare the SQL statement you want to execute. This could be any of the SQL we’ve executed through the interface or lots of other SQL statements. In my case, I want to pull the data from the full view and display it, so I’ll parse:

select * from full;

Finally, you’ll call sqlite3_step and while it returns SQLITE_ROW, you can process the row using calls like sqlite3_column_text. At the end, you finalize the database and close it. Here’s the code with error handling removed:

#include <sqlite3.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
sqlite3 *db;
sqlite3_stmt *sql;
int rv;
rv=sqlite3_open("parts.db",&db);
rv=sqlite3_prepare_v2(db, "SELECT * from full", -1, &sql, NULL);
do
{
rv=sqlite3_step(sql);
if (rv==SQLITE_ROW)
{
printf("%s,",sqlite3_column_text(sql,0));
printf("%s\n",sqlite3_column_text(sql,2));
}
} while (rv==SQLITE_ROW);
sqlite3_finalize(sql);
sqlite3_close(db);
return 0;
}

Or, have a look at the full code. In a case where you didn’t care about stepping through rows, you might have called sqlite3_exec. Even the documentation admits this is just a wrapper around a prepare, a step, and a finalize so you can just pass in a string and expect it to work.

Of course, there are many more calls. For example, you can call sqlite_column_int or other calls to get particular types. You can bind parameters to SQL calls to set values instead of building a string. But this shows you just how easy it can be to do a simple SQLite program.

So next time you find yourself inventing a new file format, think about using SQLite instead. You’ll get free tools and once you learn SQL you’ll find there is a lot you can do without writing any actual code other than different SQL commands. You can even use Git-like branching to keep versions of your database. Then again, some people use git as a database, but we don’t suggest it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK