1

Burst on-prem GPU workloads from BeeGFS/E-Series clusters to Spot Ocean for Spar...

 1 year ago
source link: https://scaleoutsean.github.io/2023/01/12/beegfs-eseries-hybrid-cloud-spot-ocean-spark.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Burst on-prem GPU workloads from BeeGFS/E-Series clusters to Spot Ocean for Spark in the cloud

12 Jan 2023 -

3 minute read

Problem statement

Enterprises with analytics, HPC and Deep Learning workloads that have high-bandwidth storage requirements use BeeGFS with NetApp E-Series.

For various reasons they may need to burst-to-cloud. Some of the main challenges in this process:

  • Data replication from on-premises BeeGFS to the cloud
  • Storage performance in the cloud
  • Cost of compute resources in the cloud

Data replication

For obvious resons (granularity) in this use case file and object replication are generally a better choice than volume replication.

To copy BeeGFS files to the cloud you may use a file sync tool of your choice: rsync, rclone, etc.

Alternatively, NetApp has a subscription (charged per hour) service called Cloud Sync.

I wrote about various ways to sync files and objects here. If you use Cloud Sync, automation is availble via the Cloud Sync API.

Data replication from the cloud to on-premises is usually not a problem because we’re talking just about the results (few KB to few GB, perhaps). To avoid having to open enterprise firewall to incoming connections (or - even worse - use VPN), simply post your results to the cloud provider’s Object Store and download them from there using Cloud Sync or rclone.

Storage

For Big Data anlaytics and DL/ML workloads it usually pays to use fast storage because that saves compute costs. Cloud GPUs aren’t extactly cheap, so if you use BeeGFS on-premises, you likely want to use it in the public cloud for similar workloads.

The creators of BeeGFS, ThinkParQ, have created BeeOND, a subscription service that’s based on BeeGFS running on hyperscaler hardware. Back in 2019 it was possible to get close to 100 GiB/s from such clusters (see this example from Azure).

Compared to ONTAP-based cloud storage, BeeOND is limited in terms of data management features: backup, snapshots, etc. If you need to protect your cloud data before BeeOND subscription is terminated, make a copy in the hyperscaler’s Object Storage.

CSI drivers

GPU compute nodes

We want to avoid unnecessary cost of GPU compute resources.

To do that we can use a Spot.io service called Spot Ocean for Spark.

Spot Ocen for Spark

Spot doesn’t seem to build GPU clusters from scratch, but Spot allows you to import existing clusters to Spot and let Spot control and manage them.

This means we can build a cluster with GPU-based worker nodes and tell Spot Ocean for Spark to use it.

The next question is whether containers used by Spot Ocean for Spark have CUDA? Spot Ocean Spark uses its own images. Or, to be perfectly correct, containers based on its base images (this will come useful later).

At the time of writing this post, the images don’t seem to have CUDA libraries in them. Because we don’t have to use Spot images and can use custom Docker images based on Spot’s base images we can easily start with those and create custom containers with CUDA drivers.

Here’s information about the official Spark official images which can be used as base and this is the list.

Performance monitoring

For short-lived clusters I’d probably use hyperscaler’s monitoring and CLI tools built into BeeGFS. Why?

  • Cost optimization is done by Spot
  • Cluster will be deleted anyway

Alternatively, BeeGFS monitoring plugin for Grafana (which can run on-premises or be the free Grafana Cloud; both would connect to a small long-running Influx DBv1 container that could be left running in the case bursting to cloud happens frequently enough.)

Workflow

The entire workflow would look like this:

  • Preparation
    • Build images with CUDA version recommended by hyperscaler and store them in private registry
  • Replication
    • Stand-up a minimal BeeOND cluster
    • Replicate data to BeeOND
  • Compute
    • Grow BeeOND cluster to enough nodes
    • Stand up a Kubernetes cluster with GPU-based nodes and deploy BeeGFS CSI
    • Import the cluster to Spot Ocean for Spark
    • Run Spark and other workloads
  • Terminate temporary environment
    • Copy results to Object Storage or back to on-premises
    • Scale down to zero or destroy Kubernetes cluster
    • Destroy BeeGFS cluster

Users who burst to the cloud often could scale up and down rather than re-create clusters every time.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK