3

4db: A Dynamic File-based NoSQL Database for C++

 2 years ago
source link: https://www.codeproject.com/Articles/5318279/4db-A-Dynamic-File-based-NoSQL-Database-for-Cplusp
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Background

In this article, we will cover the background, interface, and implementation of 4db, a dynamic, file-based NoSQL database for C++ projects.

4db draws its roots from various things called metastrings over the years.

metastrings is survived in .NET by 4db.net in GitHub, but 4db is a new C++ port. It is modern, clean, and portable, and built directly on the SQLite C API. It should serve C++ developers in need of a simple file-based database well.

Interface

Using 4db is best demonstrated with an annotated sample program:

Shrink ▲   Copy Code
/// <summary>
/// This program demonstrates creating a 4db database and using all four supported commands
/// to populate, access, and manipulate the data.
/// 1. UPSERT
/// 2. SELECT
/// 3. DELETE
/// 4. DROP
/// </summary>
#include "ctxt.h"
#pragma comment(lib, "4db")

#include <stdio.h>

// used by main()
void addCar(fourdb::ctxt& context, int year, 
            const std::wstring& make, const std::wstring& model);

int main()
{
    // 4db is built on SQLite, so to create a 4db database,
    // we simply need to specify the location of the database file.
    // If the file does not exist, an empty database is automatically created.
    // The ctxt class manages the database connection,
    // provides many useful functions for executing SELECT queries,
    // and implements the UPSERT, DELETE and DROP functions.
    // There are many classes in 4db, but ctxt is the one you deal directly with;
    // you can auto the rest as seen here.
    printf("Opening database...\n");
    fourdb::ctxt context("cars.db");

    // Drop our database table to start things clean.
    printf("Starting up...\n");
    context.drop(L"cars");

    // Pass our context into addCar to add database records...so many cars...
    printf("Adding cars to database...\n");
    addCar(context, 1987, L"Nissan", L"Pathfinder");
    addCar(context, 1998, L"Toyota", L"Tacoma");
    addCar(context, 2001, L"Nissan", L"Xterra");
    //...

    // Select data out of the database using a basic dialect of SQL.
    // Here, we gather the "value" pseudo-column which 
    // is the primary key added by the addCar function.
    // We create a query object with our SELECT query,
    // pass in the value for the @year parameter,
    // and use ctxt::execQuery function to execute the query,
    // handing back a reader object to process the results.
    printf("Getting cars...\n");
    std::vector<fourdb::strnum> oldCarKeys;
    auto select =
        fourdb::sql::parse
        (
            L"SELECT value, year, make, model "
            L"FROM cars "
            L"WHERE year < @year "
            L"ORDER BY year ASC"
        );
    select.addParam(L"@year", 2000);
    auto reader = context.execQuery(select);
    while (reader->read())
    {
        // Collect the primary key ("value") that addCar added
        oldCarKeys.push_back(reader->getString(0));

        // 4db values are either numbers (doubles) or strings
        printf("%d: %S - %S\n", 
               static_cast<int>(reader->getDouble(1)), 
               reader->getString(2).c_str(),
               reader->getString(3).c_str());
    }

    // We use the list of primary keys to delete some rows.
    printf("Deleting old cars... (%u)\n", static_cast<unsigned>(oldCarKeys.size()));
    context.deleteRows(L"cars", oldCarKeys);

    printf("All done.\n");
    return 0;
}

/// <summary>
/// UPSERT a car into our database using the define function.
/// You pass the table name, primary key value, and column data to this function.
/// No need to explicitly create the table, just refer to it by name 
/// and the define function takes care of it.
/// NOTE: The primary key value and column data values
///       can only be strings or numbers.
///       For numbers, they have to be convertible to doubles,
///       and are selected out of the database as doubles.
/// </summary>
/// <param name="context">ctxt for doing database work</param>
/// <param name="year">year of the car</param>
/// <param name="make">make of the car</param>
/// <param name="model">model of the car</param>
void addCar(fourdb::ctxt& context, int year, 
            const std::wstring& make, const std::wstring& model)
{
    std::wstring tableName = L"cars";
    fourdb::strnum primaryKey = fourdb::num2str(year) + L"_" + make + L"_" + model;
    fourdb::paramap columnData
    {
        { L"year", year },
        { L"make", make },
        { L"model", model },
    };
    context.define(tableName, primaryKey, columnData);
}

Implementation

The metastrings concept was always to present what looks like a rows-and-columns SQL interface with an implementation in a "real" SQL database, initially only MySQL, now only SQLite.

The "virtual" schema's tables are pulled into separate real tables in SQLite:

  • A registry of all tables in the virtual schema is stored in the real tables, um, table
  • All columns in all tables are in the names table
  • Every unique value in the entire database - wstring or double - is stored in the values table
  • Each row in each virtual table is represented by a row in the real items table
  • Everything is glued together by the itemnamevalues table, one row per data cell, itemid -> nameid -> valueid

Tons of overhead, will probably never have high performance. But it's simple, and allows for a dynamic schema. And there are tons of use cases where high performance isn't a requirement. I see a bright future for this technology.

musicdb - A Larger Example

iTunes, on Windows at least, has a file describing the media library in an easily parsed XML file. There is an entry in this file for each track in the library. Each entry looks like this in the XML file:

Shrink ▲   Copy Code
<dict>
	<key>Track ID</key><integer>1002</integer>
	<key>Size</key><integer>7973544</integer>
	<key>Total Time</key><integer>242755</integer>
	<key>Disc Number</key><integer>1</integer>
	<key>Disc Count</key><integer>1</integer>
	<key>Track Number</key><integer>1</integer>
	<key>Track Count</key><integer>13</integer>
	<key>Year</key><integer>2012</integer>
	<key>Date Modified</key><date>2016-08-22T01:35:10Z</date>
	<key>Date Added</key><date>2021-03-17T00:41:46Z</date>
	<key>Bit Rate</key><integer>256</integer>
	<key>Sample Rate</key><integer>44100</integer>
	<key>Artwork Count</key><integer>1</integer>
	<key>Persistent ID</key><string>A8CF63F61390C4BC</string>
	<key>Track Type</key><string>File</string>
	<key>File Folder Count</key><integer>5</integer>
	<key>Library Folder Count</key><integer>1</integer>
	<key>Name</key><string>Kryptonite</string>
	<key>Artist</key><string>3 Doors Down</string>
	<key>Album Artist</key><string>3 Doors Down</string>
	<key>Composer</key><string>Matt Roberts</string>
	<key>Album</key><string>The Greatest Hits [+digital booklet]</string>
	<key>Genre</key><string>Alternative Rock</string>
	<key>Kind</key><string>MPEG audio file</string>
	<key>Comments</key><string>Amazon.com Song ID: 233359329</string>
	<key>Sort Album</key><string>Greatest Hits [+digital booklet]</string>
	<key>Location</key><string>file://localhost/C:/Users/ballo/Music/iTunes/iTunes%20Media/
    Music/3%20Doors%20Down/The%20Greatest%20Hits%20%5B+digital%20booklet%5D/
    01-01-%20Kryptonite.mp3</string>
</dict>

The musicdb program parses the entire XML file and builds an in-memory representation of the library. It makes a single 4db call to load that representation into the 4db database:

Shrink ▲   Copy Code
bool inDict = false;
std::unordered_map<fourdb::strnum, fourdb::paramap> dicts; // accumalate info of all tracks
fourdb::paramap dict; // info about the current track
...
while (xmlFileStream)
{
	std::wstring line;
	std::getline(xmlFileStream, line);
	const wchar_t* tag = wcsstr(line.c_str(), L"<");
	if (tag == nullptr)
		continue;

	if (_wcsicmp(tag, L"<dict>") == 0)
	{
		inDict = true;
		dict.clear();
	}
	else if (_wcsicmp(tag, L"</dict>") == 0)
	{
		inDict = false;
		if (dict.size() == 1)
			continue;

		std::wstring key; // just needs to be unique
		for (const auto& kvp : dict)
		{
			const auto& snum = kvp.second;
			if (snum.isStr())
				key += snum.str();
			else
				key += fourdb::num2str(snum.num());
			key += '|';
		}

		dicts.insert({ key, dict });
		++addedCount;
	}
	else if (inDict && wcsncmp(tag, L"<key>", 5) == 0) // metadata key line
	{
		const wchar_t* closingKey = wcsstr(tag, L"</key>");
		if (closingKey == nullptr)
		{
			printf("Unclosed <key>: %S\n", line.c_str());
			continue;
		}

		std::wstring key(tag + 5, closingKey);
		cleanXmlValue(key); // deal with <, etc.

		const auto& fieldNameIt = fieldNames.find(key); // look column name from iTunes 
                                                        // field name
		if (fieldNameIt == fieldNames.end())
			continue; // not a field we care about
		const auto& fieldName = fieldNameIt->second;

		// It's either integer, string, or date
		const wchar_t* valueTag = nullptr;

		valueTag = wcsstr(closingKey, L"<integer>");
		if (valueTag != nullptr)
		{
			const wchar_t* closingValue = wcsstr(tag, L"</integer>");
			if (closingValue == nullptr)
			{
	#ifdef _DEBUG
				printf("Unclosed <integer>: %S\n", line.c_str());
	#endif
				continue;
			}

			std::wstring valueStr(valueTag + 9, closingValue);
			double valueNum = _wtof(valueStr.c_str());
			dict.insert({ fieldName, valueNum });
			continue;
		}

		valueTag = wcsstr(closingKey, L"<string>");
		if (valueTag != nullptr)
		{
			const wchar_t* closingValue = wcsstr(tag, L"</string>");
			if (closingValue == nullptr)
			{
	#ifdef _DEBUG
				printf("Unclosed <string>: %S\n", line.c_str());
	#endif
				continue;
			}

			std::wstring valueStr(valueTag + 8, closingValue);
			cleanXmlValue(valueStr);
			dict.insert({ fieldName, valueStr });
		}

		valueTag = wcsstr(closingKey, L"<date>");
		... // dates are treated like strings
	}
	...
}

// This one line loads the entire iTunes library into the 4db database "tracks" table
// The lambda / printf business is for pacifying the user during the import process
context.define(L"tracks", dicts, [](const wchar_t* msg) { printf("%S...\n", msg); });

Once the 4db database is populated, you can query it using a basic dialect of SQL SELECT statements, and get query results in a pleasant format:

Shrink ▲   Copy Code
printf("> ");
std::wstring line;
std::getline(std::wcin, line);
...

auto select = fourdb::sql::parse(line);
auto paramNames = fourdb::extractParamNames(line);
if (!paramNames.empty())
{
	printf("\n");
	printf("Enter values for the parameters in your query;"
		   " put # in front of numeric values:\n");
	printf("\n");
	for (const auto& paramName : paramNames)
	{
		printf("%S: ", paramName.c_str());
		std::getline(std::wcin, line);
		if (!line.empty() && line[0] == '#')
			select.addParam(paramName, _wtof(line.substr(1).c_str()));
		else
			select.addParam(paramName, line);
	}
}

auto reader = context.execQuery(select);
auto colCount = reader->getColCount();

std::vector<std::vector<std::wstring>> matrix;
std::unordered_set<std::wstring> seenRowSummaries;

while (reader->read())
{
	std::vector<std::wstring> newRow;
	for (unsigned col = 0; col < colCount; ++col)
		newRow.push_back(reader->getString(col)); // getString works with all column types

	std::wstring newRowSummary = fourdb::join(newRow, L"\v");
	if (seenRowSummaries.find(newRowSummary) != seenRowSummaries.end())
		continue;

	seenRowSummaries.insert(newRowSummary);
	matrix.push_back(newRow);
}

printf("\n");

printf("Results: %u\n", static_cast<unsigned>(matrix.size()));
if (matrix.empty())
	continue;

printf("\n");

std::vector<std::wstring> headerRow;
for (unsigned col = 0; col < colCount; ++col)
	headerRow.push_back(reader->getColName(col));
matrix.insert(matrix.begin(), headerRow);

std::vector<unsigned> columnWidths;
for (const auto& header : headerRow)
	columnWidths.push_back(static_cast<unsigned>(header.size()));
for (const auto& row : matrix)
{
	for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
		columnWidths[cellidx] = std::max(columnWidths[cellidx], row[cellidx].size());
}

for (size_t cellidx = 0; cellidx < columnWidths.size(); ++cellidx)
{
	const auto& header = headerRow[cellidx];
	auto headerWidth = columnWidths[cellidx];
	printf("%S", header.c_str());
	for (size_t s = header.size(); s <= headerWidth; ++s)
		printf(" ");
}
printf("\n");

for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
{
	auto headerWidth = columnWidths[cellIdx];
	for (size_t s = 0; s < headerWidth; ++s)
		printf("-");
	printf(" ");
}
printf("\n");

for (size_t rowIdx = 1; rowIdx < matrix.size(); ++rowIdx)
{
	const auto& row = matrix[rowIdx];
	for (size_t cellIdx = 0; cellIdx < columnWidths.size(); ++cellIdx)
	{
		const auto& value = row[cellIdx];
		auto headerWidth = columnWidths[cellIdx];
		printf("%S", value.c_str());
		for (size_t s = value.size(); s <= headerWidth; ++s)
			printf(" ");
	}
	printf("\n");
}

SQLite Wrapper

Central to 4db are wrapper classes around SQLite's C API. The db class manages the database connection and provides routines for executing queries. The dbreader class prepares and executes queries and provides access to query results.

Parameters are passed in using a paramap, which is a typedef of std::unordered_map<std::wstring, strnum>. strnum is a class that is either a wstring or a double.

It was fun learning about SQLite's C API. The wrapper classes mostly stand on their own; you can mold them to your purposes with little work.

Conclusion

I hope you've enjoyed learning about 4db, and seeing how fun processing your iTunes media library can be with 4db.

History

  • 22nd November, 2021: Initial version

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK