9

Harnessing Multi-Model Capabilities with Spotify – Processing Semi-Structured Da...

 1 year ago
source link: https://blogs.sap.com/2023/02/27/harnessing-multi-model-capabilities-with-spotify-processing-semi-structured-data-with-sap-hana-cloud-sap-data-warehouse-cloud-part-1/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Harnessing Multi-Model Capabilities with Spotify – Processing Semi-Structured Data with SAP HANA Cloud/ SAP Data Warehouse Cloud – Part 1

This is going to be a series of blogs explaining the multi-model capabilities of SAP HANA Cloud /SAP Data Warehouse Cloud with one scenario. So grab a coffee before you go through these series of architecture, solutioning , implementation with code repositories and finally an extended version of the series as a SAP Discovery mission(to be published).

We have published discovery missions for SAP Business Technology Platform[SAP BTP] adoptions focusing on SAP HANA Cloud and SAP Data Warehouse Cloud [SAP DWC] based on a customer reference use cases. When the use case is specific, it may or may not cover all the latest features that were released as part of our platform. So, we planned this time to cover those and use an interesting scenario which might spark your interests to explore all the multi-model capabilities [hopefully!]. Let’s not restrict ourselves with BTP components and also discuss about deployment options using hyperscaler services/open source frameworks.

If you are interested in the blog for hands-on directly, you can check here

Why Spotify?

Spotify is one of the most successful audio streaming and media services provider who focuses on the real “Customer Experience” or rather say customer obsessed. They makes customer experience as personal as possible [at least from my experience 😊 ]. The important reasons are

  1. Spotify is a prime example for microservices based architecture that supports and scales for more than 205 million premium users and 295 million active users.
  2. It has a vibrant developer community and provides  interesting APIs and packages for different technological deployments [Web APIs /Spotipy / Spotify R]
  3. Their focus on research areas such as Audio Intelligence, Human Computer Interaction ,
    Algorithmic Responsibility, User modeling and corresponding publications tell us how serious
    they are about customer experience as well as music industry.
  4. Finally, we wanted to consider a scenario which our SAP community with different expertise levels can relate to, learn and adopt the concepts in their own area of expertise.

Now with this introduction on Spotify Developer platform, let’s get into the scenario on how we plan to use SAP HANA Cloud / SAP DWC to consume their APIs and process the semi-structure data or JSON.

The Beginning

So.. what’s the Scenario

  1. Spotify releases Top charts for every country and it is available as weekly or daily playlist. So, let’s consider the weekly playlist for say 10 countries.
    Spotify%20Weekly%20Chart
  2. Using Spotify APIs, we are going to collect Top chart weekly for 10 countries and store it as JSON in HANA Cloud.  Using identifiers, we will extract attributes such as track name, artist name, album name, popularity.
  3. For all these song details that was extracted in step 2, we will use Spotify audio features API to extract metrics such as danceability, speechiness, liveness . We will explain the metrics in the later steps. And all these metrics will be stored as JSON too.
  4. Create a SQL view merging both the JSON collections acquired in step 2 & 3 .
  5. Based on the SQL views, we will create Calculation views to understand which songs from different playlists have higher danceability and energy features . We will perform R computations to group by Country Playlist and understand the features.  The metrics can now be consumed via SAP Analytics Cloud or other tools such as Microsoft Power BI.

 

Scenario-2.png

There are additional scenarios which we will explain in the architecture and will be covered as part of Discovery Missions.

Now the Architecture – SAP HANA Cloud 

Consider this architecture for consuming Spotify APIs with SAP HANA Cloud & let me explain it in detail. Also, I will be using the numbers I have added in architecture diagram while explaining the scenario.

SpotifyHC_Final.png

Spotify provides Web APIs[1] to consume public playlists, tracks, artists, albums, podcasts and extracting audio features for all the tracks.  In order to consume these APIs, I will use Python and the Spotipy package . They already have shared enough sample code snippets on how to use authentication, call APIs for all scenarios. I just put the pieces together to have the valid JSON structure that we need to store it as collection in SAP HANA Cloud. And further added separate functions to loop around playlists, capture audio features of every track in every playlist and store them in a separate collection.

We can discuss about the different deployments options that we can consider while ingesting JSON documents in SAP HANA Cloud.  Here are the options

Data Ingestion – Option 1

For validating if the ingestion works fine, you can execute the scripts[2.1] directly from Google Colab /Visual studio or any python environment of your choice. Also, the code can be executed from a VM/ any compute which could be used for data extraction/ingestion. Basically, you will be calling the APIs using Spotipy libraries and use hana_ml library to insert the captured JSON documents as collections [4].

SpotifyHC_Final_Option1.png

Data Ingestion – Option 2

Use the same python script but containerize it using docker and deploy it using SAP Kyma[2.2]. Then schedule to run every week so that you can collect weekly playlists for different countries and ingest it into SAP HANA Cloud[4]. I have already explained how to containerize and deploy in this blog. You just need to use the follow the same steps for the current scenario.

SpotifyHC_Final_Option2.png

Data Ingestion – Option 3

If you want to explore similar services from hyperscalers, you can also try using the same  python code push it to git repo, and use  Google Cloud Platform App Engine[2.3](Standard/Flex) to schedule the data ingestion into SAP HANA Cloud[4] . I have already explained how to containerize and deploy in this blog. You just need to use the follow the same steps for the current scenario.

SpotifyHC_Final_Option3.png

And of course you could think about different scenarios for data ingestion using Integration Suite or SAP Data Intelligence.

Once you have decided on the deployment options (2.1-2.3),  all the response from Spotify APIs will be stored as JSON documents in HANA Cloud. They will be stored as Collections under a schema.  If you are scheduling it using SAP Kyma or GCP App Engine, you can organize the name of the JSON document as per date or week before ingesting using hana_ml package.

Now that we have the top songs and  audio features from all play charts by country , we can create calculation views and consumes the data using SAC[6]. We can include R computations with the server provided in SAC or by connecting the local R server. Or you could consume the data using  Microsoft  Power BI too [7].  I am using it since there are some direct features to display  images on the fly from a table .

There are other scenarios which we can consider after step 4, ingestion of API responses in HANA Cloud . Consider this architecture flow   4 -> 8 -> 9 . Once we have the data ingested, we can compare the Playlist for Top Charts with you own private playlists using Graph Engine[8]. We can compare collections of both and see what are the common artists, and songs you have in common hand have the corresponding metrics displayed using based on SAP Cloud Application Programming or using SAP Build. Or we can expose the  Graph based computational data through python based web framework, Django.

What about the Architecture with SAP Data Warehouse Cloud(SAP DWC)?

The architecture flow is almost the same as SAP HANA Cloud except some minor changes. I would just explain the differences and the steps that you need to do from SAP HANA Cloud. As you see in the below architecture, SAP DWC does not enable the Document Store capabilities (as of now) to store data as JSON Collection. Instead, we will be ingesting the data as large object within the HANA Cloud tenant of SAP DWC.  We do have the standard JSON to SQL function which we can apply on the large object and extract the response.  And the result would be the same as how we extracted from HANA Cloud JSON collection.

SpotifyDWC-2.png

Here are the steps to be followed for SAP DWC:

  • Consume the Spotify APIs using python script directly [2] or through SAP Kyma [2.1]
  • We will be using python package hdcli to ingest the data as large object NCLOB.
  • Create a Open SQL schema access from a SAP DWC space [3]
  • Ingest the JSON response as data type “NCLOB” in a table [3]
  • In order to ingest the data, we can use SAP Kyma[2.1] or use python script [2] directly from any VM/Compute.
  • For comparing Top Chart playlists with your private playlist, we can utilize the graph engine as we did for SAP HANA Cloud. For the development process on who to build based on the underlying HC tenant, please refer to my blog.
  • You can use the Data Builder feature to blend data from different playlists [5]
  • You can either expose the data to SAP SAC[6], Non SAP reporting tools[8]

Also you can expose the same data for building apps using SAP CAP[7]  , SAP Build[7] or the external web based framework such as Django[9]

Once you go through the one of the scenarios say reporting based on top tracks of different countries, you will be able to compare the Danceability metrics(R computation)and  visualize the data in SAP SAC as seen below.  The below visualization compares top 500 songs from 10 different countries based on Danceability metric – overall regularity of a song based on tempo, rhythm, stability and beat strength.. you feeling it ? 🙂

Rviz1.png

And here is another metric Speechiness that is compared across all tracks grouped by Country as below [R computations using Spotify R package]. All these code that I used for Spotify R are already mentioned in the github repositories by different open source developers. I just modified according to these specific scenarios and will add the references in my blogs focusing on hands-on.

Rviz2.png

And here is the visualization for the same in Power BI. Here we compare the same metrics danceability and energy for all tracks from 10 different countries. Also Power BI has a cool feature of displaying images on the fly and part of table . And yeah Taylor Swift’s new album is  damn cool ! Still feeling it ?

powerbi.png

What will you learn?

Based on the kind of deployment you test or try hands-on, you will learn some or all of the topics mentioned below 😊

  • SAP HANA Cloud Basics
  • HANA Multi-model capabilities [Document Store / Graph / Modeling / CAP]
  • Development using Business Application Studio
  • Python Basics & Development using Visual Studio
  • Docker Set up and configuration and Docker hub basics
  • SAP BTP Kyma Deployments
  • GCP basics and App Engine Deployments
  • SAP Data Warehouse Cloud Basics and Integration with HDI
  • SAP Build Integration with SAP HANA Cloud
  • SAP Analytics Cloud Introduction and Basic Story Board Developments
  • Programming based on Django Framework

We went through different architecture options using SAP HANA Cloud/ SAP DWC. Hope it provides you a high-level overview on how to process the semi-structured data. Now let’s focus on the implementation part. In the following blogs, I will be discussing about consuming semi-structured data in SAP HANA Cloud / SAP DWC and consume it for reporting. We will also cover scenarios focusing on Graph Engine comparing public and private playlists and consuming it in apps based on SAP CAP or SAP Build Apps .  And finally embedding web players in Django with SAP HANA Cloud as backend and extract metrics based on songs played.

Please do let us know your feedback. If the architecture and deployment options interests you, kindly continue with part 2 where discuss the implementation based on SAP HANA Cloud . And if you are looking for access to SAP HANA Cloud, you can sign up individually using SAP free tier offerings.

Happy Learning !


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK