6

Optimizing the Balance Between Performance and Cost with GigaSpaces v15.8

 2 years ago
source link: https://www.gigaspaces.com/blog/optimizing-the-balance-between-performance-and-cost-with-gigaspaces-v15-8
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Optimizing the Balance Between Performance and Cost with GigaSpaces v15.8

6min. read
iStock-1197646065.jpg

Latest release significantly reduces memory footprint and infrastructure costs, and boosts digital application performance. 

2020 has been a tough year all around. With tight budgets and limited resources, enterprises are looking to optimize infrastructure TCO. At the same time, they are looking to accelerate digital transformation. According to the 2021 Gartner Board of Directors Survey, 69% of boards of directors accelerated their digital business initiatives following COVID-19 disruption. Such digital business initiatives require the speed, scale and agility to handle the ever growing amount of business data. 

With this in mind, I am excited to announce today the release of GigaSpaces v15.8 which offers advanced functionality to significantly reduce memory footprint, reduce infrastructure costs, and boost digital application performance. The main three pillars of InsightEdge v15.8 include:

  1. Reducing RAM footprint: optimizing in-memory data store RAM to reduce hardware costs by up to 70% while retaining blazing performance
  2. Boosting SQL query performance with smart data locality: 10x faster response time on reporting and BI compared to previous releases
  3. Cloud native lifecycle management: enable agile deployments of new versions of data services with no system downtime 

Smart RAM Footprint Reduction to Save up to 70% on Infrastructure Costs 

With v15.8, you can now optimize any object by simply marking it as “Storage Optimized”. This will automatically reduce the RAM storage footprint the object requires. The degree of optimization depends on the ratio of indexed properties to unindexed properties (fields). The more unindexed properties you have, the bigger the reduction will be. 

15.81x-600x319.png

Figure 1: Screenshot showing how to optimize a property with a single-click

Consider a BI dashboard that displays data records with 100 properties, when only 20 properties out of the 100 are indexed. Selecting “Storage Optimized” for all unindexed properties will reduce RAM utilization by up to 70%.

Why is it important? It means significant cost savings. Here’s an example. Let’s assume 1TB of RAM costs $10 an hour (for real life pricing options, see for example AWS EC2 on-demand pricing). This amounts to $7K monthly, or $86K annually. Backup partitions would double the cost to $173K, and if you have a remote disaster recovery data center for high availability, this will set the cost at $346K. Assuming RAM footprint reduction of 50%, you will save $173K annually for every 1TB of data. This cost savings can add up fast if you utilize more than 1TB of data, or if you add additional clusters, such as a cluster in NY and a cluster in London. 

table-600x184.png

Figure 2: Table showing expected cost savings with optimized RAM storage and 50% RAM footprint reduction

Below are benchmark results that present the expected footprint reduction in various ratios of indexed properties. This benchmark is based on 100k objects, with 100 fields of type string, length 10.

RAM.png

Figure 3: Benchmark showing footprint reduction for 100k objects, 100 fields of type string, length 10

When optimizing the RAM footprint, the impact on performance is as seen below. The difference in latency is between 1-2 milliseconds on remote operations:

For optimal tuning, the user can select which objects to optimize and which not, trading storage and performance as required. Early Access to the storage optimization feature is available today.

performance.png

Figure 4: Benchmark showing performance of optimized objects vs query of non-optimized objects

Boosting Query Performance by More than 10X

InsightEdge now allows you to boost query performance with smart data locality using Broadcast Objects. An Object can now be designated as a “Broadcast Object” with a single click. 

kingston.png

Figure 5: Screenshot showing how to designate an object as a Broadcast Object with a single click

This enables server-side JOIN performance by automatically replicating selected small tables of data to all the nodes in the cluster. In other words, it gives you the flexibility to balance storage footprint and performance, and can improve your reporting and BI performance by 10x. The scenarios that utilize the Broadcast Objects feature in the most optimized way are cases where you JOIN two tables, when one table is a large dynamic table of transactions, while the other table is a small static table that does not change frequently, such as daily exchange rates. The small static table will be replicated to all nodes, but being small it will have a minor impact on RAM footprint. This will allow local JOIN operations, significantly reducing network overhead and resource utilization, leading to low latency and higher concurrency. 

Let’s take a real life example. A hedge fund was querying a large table of live stock quote records, JOINed with three other static tables with additional information about the data source and the equity. When the four tables were independent sharded tables, the response time was too slow for their needs.

joining-4-tables-600x198.png

Figure 6: Diagram showing a query JOINing 4 tables

They then designated the three reference tables as Broadcast tables, leaving only the Quote table as an independent partitioned table. The performance improved dramatically, and the queries ran 12x faster. The standard deviation also dropped, leading to more predictable performance. You can see the results in the following table, when running 50 concurrent users:

table2-600x124.png

Figure 7: A table comparing query performance of 4 sharded tables vs. 1 sharded table + 3 broadcasted tables with 50 concurrent users

Cloud Native Lifecycle Management 

With v15.8, GigaSpaces adds support for Kubernetes Operator to provide full lifecycle management for your data applications. This allows organizations to use Kubernetes Helm for day-1 deployment in a cluster, then use Kubernetes Operator for day-2 management tasks. It allows deployments of new data services or versions with business logic to production without any downtime. It also allows for auto scaling up or out to support unexpected workloads.

one-click-service-deployment-600x378.png

Figure 8: One-click service deployment

Additional Enhancements

The confluent Hub now includes a Kafka Connect sink connector. This allows easier mapping and integration of data streams, allowing consolidation of multiple heterogeneous data sources to a unified in-memory Data Space.

kafka-600x290.png

Figure 9: Kafka Connect GigaSpaces on the Confluent Hub

Summary

With InsightEdge Version 15.8, we continue to innovate with new tools to help you balance cost and performance. You can significantly lower your infrastructure costs by reducing RAM footprint, while improving query performance with object broadcasting and streamline your deployment using the Kubernetes operator. I invite you to experience it for yourself today. You can download v15.8 here for a free trial.

Additional Links


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK