Cameron Swift

February 2, 2023 8 minute read

SAP HANA Cloud Document Store – When Would I Use A Document Store?

Introduction

There are a number of wonderful blog posts here on the SAP Community talking about how you can get started with SAP HANA Cloud Document Store, and what you can do with it

For this blog post, I wanted to meet absolute beginners where they are:

Introducing the design considerations behind using the more familiar Relational (Row and Column) Stores
Discussing where we might run into challenges when using Relational Stores
Looking at where a Document Store might help
Explaining at a high-level what SAP HANA Cloud Document Store is

This blog post is intended to be a high level introduction – it won’t address all design considerations, and will simplify where able. Those looking for more in-depth technical articles can find some recommendations under Other Resources at the end of this post

Relational Stores Recap

When working with a database using Relational Stores, we want to start by deciding how we’ll organise and structure our data. This structure is called a Data Model

To recap a very simplified Relational Data Model, let’s look at how we might store information about a book

Our simplified book data

You can describe this book in many other ways – it may be brown, it may be sold in a hardcover, and it may be 388 pages long

We only want to store information that will be useful to our business. For our example, the list above will be enough

With an understanding of which information we care about, we know what we’ll need to store in a Database Table. Someone looking at this problem for the first time might sketch out a table as below

An initial table

A book’s ISBN is unique – each ISBN refers to exactly one published book. This uniqueness lets us use the ISBN as something called a Primary Key (indicated by the key symbol)

By looking at our Primary Key Column (in this case ISBN), we can tell which book each row of data is about

Without knowing better we deploy our table in a database and people begin using it. Before long, our colleagues come to us with issues

Challenges with “One Central Table”

A colleague wants to search for all books written by Cameron Swift. Except it turns out there’s more than one person in the world called Cameron Swift

To help tell them apart, we add an Author ID that is unique to each Author. While we’re at it, we give our Publishers a Publisher ID too

Adding IDs to our Table

A second colleague wants to enter a book with two Authors.In theory, we could add another set of Author ID and Name columns. Unfortunately, that won’t help us when we need to enter a book with three Authors, let alone four or more Authors

To help us store this data, we start by separating the data into multiple tables. We move the Author and Publisher data out into their own tables

Splitting%20our%20data%20into%20Books%2C%20Authors%20and%20Publishers%20Tables

Splitting our data into Books, Authors and Publishers Tables

As we can see above, the Author ID and Publisher ID also uniquely identify each row in our Authors and Publishers tables. These are the Primary Keys for those tables, in the same way that ISBN is for our Books table

“Relational” Data

There’s no point storing our data across different tables unless we know how they’re linked so we can get it back out again

This is where the term Relational Data comes from – in order to make sense of the data, we need to understand our tables and the Relationships between those tables

The way we represent this relationship is by adding the Primary Key from one table as a column in another table

As an example, we’re going to link the Publishers table to our Books table by including the primary key for our Publisher table (the Publisher ID) as a column in our Books table

Linking%20our%20Books%20and%20Publishers%20Tables

Linking our Books and Publishers Tables

By matching up the Publisher ID columns from our Books and Publishers tables, we can now easily find out the publisher for each book. Now that we understand how this works for Publishers, we can look at our Authors

Including Multiple Authors

When we think about including the Author IDs inside our book table, it may sound like we have the same problem we started with. As we showed earlier, we can’t include multiple Author IDs in our Books table, because there can be any number of Authors (with the exception of zero, because books don’t write themselves)

We need to create a third tableto link our Books and Authors tables. We’ll refer to this as a Linking Table. This table is a list of ISBNs and associated Author IDs

Our%20BooksAuthored%20Table%20links%20the%20Books%20and%20Authors%20Tables

Our BooksAuthored Table links the Books and Authors Tables

Because a book can have many authors and an author can write many books, neither column here is unique. This is by design

Starting with the ISBN for any given book, we can then look in our Linking Table. Each row with a matching ISBN points to an author of that book.

Using this approach means we don’t have to make any changes to the structure of our tables as we add multiple authors. For each additional author, we just add another row to our Linking Table

Our Relational Data Structure

Our full data structure for storing this simplified example would look something like this:

Our full data structure

Challenges with Relational Tables

Working through these challenges makes something very clear:

When new business requirements arise, these could require changes in our underlying data model. As we saw above: even with a simplified scenario – changing data models is not a trivial task

We can’t change the set of columns for just a single row – all rows within a table use the same set of columns

Business%20requirements%20often%20require%20changes%20to%20our%20data%20models

Business requirements often require changes to our data models

Depending on the scope of the change required, we could be looking at adding columns to a single table (like we did with Author ID and Publisher ID), or more complex changes requiring re-mapping our data across a number of tables

In this case, we’ll likely need to update the logic that writes to and reads from these tables – everywhere we use them

Does this mean that storing data in RelationalStores is a bad idea? Absolutely not. There are benefits to storing data in this way – we often want to enforce a known, fixed schema and we want any variation to be considered and approved. This is where Relational Data storage might be the best fit

Document Stores

On the other hand, there are times where the fixed approach of predefined columns and tables isn’t the best way to store our data

Maybe we want to store nested data – for example, our books where we can have any number of authors – without splitting the data between different tables

Perhaps we’re receiving data from Cloud APIs: if we’re getting data from Ariba’s APIs, the data isn’t returned to us as entries ready to be stored in a table. Instead, we’ll likely get results formatted in something called JSON (JavaScript Object Notation)

JSON Documents

The term JSON Document is used to describe either JSON Objects or JSON Arrays. We’re only going to describe JSON Objects in this blog post

Inside our JSON Objects, we have a number of Key/Value Pairs. For example, [Book] Title is a Key, and the Value for this might be Document Storage. Let’s revisit our earlier example

Relational%20%28table%29%20structure%20vs%20JSON%20Document%20structure

Relational (table) structure vs JSON Document structure

Above we can see how the properties of a book could be modeled in either a Table or JSON Document. Our JSON Document is an object within curly braces { }, made from a number of Key/Value Pairs

The JSON Document is self-describing. That is – if we retrieve the book data, we can check the keys to tell what each value represents. If we retrieve a single row of data from a table, the column names aren’t included by default

Retrieving%20Book%20Data%20from%20Relational%20vs%20Document%20Stores

Retrieving Book Data from Relational vs Document Stores

With our small number of columns, this is workable. Are we going to be able to remember fifty columns? A hundred? Nobody has that kind of patience

JSON Collections

Because our JSON Documents store the structure alongside the data,we can easily vary the structure between entries. Unlike rows stored in a table, JSON Documents are stored in something called a JSON Collection

To help us understand this, we can picture each JSON Document as a physical sheet of paper with our data written on it

Working with a blank piece of paper, we can write our data exactly as it is – we don’t need to stick to a fixed format

Loose sheets of paper aren’t great for storage and retrieval, so we store our related documents in a Manila Folder. This folder is our JSON Collection

JSON Documents and Collections

When you store a JSON Document, you put it in a related JSON Collection. When you want to access the Document later, you retrieve it from the same Collection

Using SAP HANA Cloud Document Store, we can do this using a modified SQL Syntax. But that’s a story for another, more technical blog post

Document Storage and Retrieval

SAP HANA Cloud Document Store is an additional feature that can be activated within an existing SAP HANA Cloud instance – for more information you can read the blog post linked under Next Steps below

Conclusion

In this blog post we’ve discussed Relational Storage using Tables, including the benefits and limitations of working with a fixed, known data structure

We’ve also discussed how we can work with data that doesn’t conform to a fixed structure using JSON Documents, stored inside JSON Collections

As always, the practical reality is more complex than the theory. Hopefully this has been a useful high level introduction to when you might use a Document Store like SAP HANA Cloud Document Store

I welcome any questions or comments below

Next Steps

For more Practical Application:

Maybe you’d like to find out how you can activate SAP HANA Document Store within SAP HANA Cloud

Enabling JSON Document Store by Cameron Swift

Perhaps you’ve already activated SAP HANA Document Store and want to get some experience with inserting and selecting JSON Documents

HANA DocStore First Steps by Cameron Swift

For more Theory:

You may have heard the term NoSQL Databases before. Document Store is a type of NoSQL Database, but isn’t the only type. Perhaps you’d like to learn about another type of NoSQL Database, the Graph Database

SAP HANA Database as a Graph Store – Introduction by Poornapragna Malamandi Suresh