Where Teradata could go with its data lakehouse

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Last week Teradata offered its long-awaited response to the emergence of the data lakehouse. As VentureBeat’s George Lawton reported, Teradata has always differentiated itself by stretching the capabilities of analytics, first with massively parallel processing on its own specialized machines, and more recently, with software-defined appliances tuned for variations in workloads — from compute-intensive to IOPS (input/output operations per second)-intensive. And since the acquisition of Aster Data Systems over a decade ago, Teradata morphed from solving big analytics problems to solving any analytics problem with a diverse portfolio of analytic libraries stretching SQL to new areas such as path or graph analytics.

With the cloud, we’ve been waiting for when Teradata would fully exploit cloud object storage, which is the de facto data lake. So the dual announcements last week of VantageCloud Lake Edition and ClearScape Analytics were logical next steps on Teradata’s journey to the data lakehouse. Teradata is finally making cloud storage a first-class citizen and opening it up to its wide analytics portfolio.

But unlike Teradata’s previous moves to parallelized and polyglot analytics, where it led the field, this time with the lakehouse, it has company. The announcement might not have mentioned the lakehouse word, but that’s what it was all about. As we noted several months back, almost everyone in the data world including Oracle, Teradata, Cloudera, Talend, Google, HPE, Fivetran, AWS, Dremio and even Snowflake has felt compelled to respond to Databricks, which introduced the data lakehouse.

Teradata’s path to the data lakehouse

Nonetheless, Teradata approaches the data lakehouse with some unique twists and is all about optimization. Teradata’s secret sauce has always been about highly optimized compute, interconnects, storage and query engines, along with workload management designed to run compute resources up to 95% utilization. When commodity hardware got good enough, Teradata introduced IntelliFlex where performance and optimizations could be configured through software. The capability to optimize for hardware not-invented-here opened the door to Teradata optimizing for AWS, and down the road, the other hyperscalers.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Teradata introduced VantageCloud a year ago, and late last year ran a 1,000+ node benchmark that no other cloud analytics provider has so far matched. But this was for a more conventional data warehouse using customary block storage.

The complication in making the lakehouse happen was developing a table format for data sitting in cloud object storage. That allows all the niceties associated with data warehouses, such as ACID transactions, which are key to ensuring consistency of data, more granular security and access controls, and raw performance. Databricks fired the first shot with Delta Lake, and more recently, other providers from Snowflake to Cloudera and others have embraced Apache Iceberg, the common thread being that this is all based on open source technology. For Lake Edition, Teradata went its own way with its own data lake table format, which the company claims delivers superior performance compared to Delta and Iceberg.

The other side of the lakehouse coin is software. Aside from its SQL engine, which has been designed to handle large, complex queries that can join up to hundreds of tables, Teradata has a large portfolio of analytic libraries that run in-database. This has been one of Teradata’s best-kept secrets. Largely the legacy of the Aster Data acquisition over a decade ago, these analytics were specially tuned to exploit the underlying parallelism, and they went well beyond SQL, encompassing functions such as n-Path, graph, time series analysis, and machine learning, all accessed through SQL extensions.

Formally branding the portfolio as ClearScape Analytics, Teradata is finally drawing attention to the fact that it is a holistic analytics platform and not simply a data warehouse, data lake or lakehouse. As part of the announcement, Teradata beefed up the time series and MLOps content. But when we deal with the data lake, data scientists are very opinionated on choosing their own languages or tools. And so, VantageCloud will also support a ring-our-own-analytics option for those preferring to write Python and work from Jupyter notebooks or their own workbenches, and currently has integrations with Dataiku, KNIME and Alteryx. ClearScape analytics will be available, both for VantageCloud Lake Edition and the standard Enterprise Edition.

Lake Edition and ClearScape Analytics are promising starts for Teradata as data lakehouse. There’s little question that Teradata’s scale and support of polyglot analytics made lakehouse a question of when, not if. And branding the analytics portfolio is more than just a marketing exercise, as it finally shines the spotlight on what had been a well-kept secret: Teradata’s differentiation goes beyond the optimized SQL engine and infrastructure to include analytics optimized for that engine. VantageCloud takes the analytics portfolio full circle by unleashing the portfolio on cloud object storage, and, with usage-based pricing, potentially opens up the portfolio for more discretionary workloads compared to the days when customers were running on-premises with firm ceilings on capacity.

A wish list for Teradata

That leaves our wish list for what Teradata should do next. In summary, we want to see Teradata venture further out of its comfort zone to draw new audiences of users. Admittedly, with the lakehouse, the challenge is not unique to Teradata, as Databricks, for example, looks to draw in business analysts while Snowflake courts data scientists.

To draw that new audience, Teradata should lower entry barriers and put open source on a more level footing with its proprietary environment. With Lake Edition, Teradata has dramatically lowered its entry pricing to $5,000/month. That is a marked drop from the six- and seven-figure annual contracts that Teradata customers typically pay, but we’d like to see Teradata go further with a freemium offering that allows new users to kick the tires. Heck, even incumbents not known for discount pricing like Oracle have embraced free tiers.

As for open source, there are a couple of pathways that we’d like to see Teradata further develop. The first is drawing non-Teradata users to ClearScape Analytics through optimized APIs to open source Delta and/or Iceberg data lakes. While performance might not be on par with Teradata’s own data lake table format, it could be made “good enough.”

Conversely, we’d like to see parallel efforts with so-called BYO analytics, drawing the Python crowd through optimized APIs with Teradata’s own data lake table format. For instance, we would like to see Teradata team up with Anaconda for juice performance of the Conda Python library portfolio, much as Anaconda is already doing with Snowflake. At the end of the day, it’s all about the analytics.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Where Teradata could go with its data lakehouse

Where Teradata could go with its data lakehouse

Teradata’s path to the data lakehouse

Event

A wish list for Teradata

Recommend

干货！2022年中国虚拟现实(VR)行业头显设备龙头企业对比：小鸟看看PK乐相科技谁是中...

戴尔Latitude 5530 业界首款使用生物基材料的PC

一文看清苹果年度重磅发布会 iPhone 14 Pro从外到内“变身” 耳机和智能手表有何提升

不老的中华｜看国风少年“神仙打架”

你的技术栈中不能少云数据库

俄罗斯将在跨境支付中推广使用数字卢布

Is there a working Google Recorder Ver 3 app for the S21?

Apple kills SIM card with eSIM-only iPhone 14 in the US

延时校准、脉冲测试一定要做的事儿！

2022年全球换热器行业市场现状及发展趋势分析全球换热器市场将恢复平稳增长【组图】

About Joyk