Databricks targets data pipeline automation with Delta Live Tables

Senior Writer,

InfoWorld | Apr 7, 2022 5:29 am PDT

Databricks has unveiled a new extract, transform, load (ETL) framework, dubbed Delta Live Tables, which is now generally available across the Microsoft Azure, AWS and Google Cloud platforms.

According to the data lake and warehouse provider, Delta Live Tables uses a simple declarative approach to building reliable data pipelines and automatically managing related infrastructure at scale, essentially reducing the time taken by data engineers and scientists on complex operational tasks.

“Table structures are common in databases and data management. Delta Live Tables are an upgrade for the multicloud Databricks platform that support the authoring, management and scheduling of pipelines in a more automated and less code-intensive way,” said Doug Henschen, principal analyst at Constellation Research.

[ Also on InfoWorld: Applying devops in data science and machine learning ]

By making authoring low-code and declarative through SQL-like statements, Databricks is looking to lower the barriers to entry for complex data work such as keeping ETL pipelines healthy.

Green IT: The color of money

0 seconds of 21 minutes, 50 secondsVolume 0%

“The bigger the company, the more likely it is to be struggling with all the code writing and technical challenges of building, maintaining and running myriad data pipelines,” Henschen said. “Delta Live Tables is aimed at easing and automating much of the coding, administrative and optimization work required to keep data pipelines flowing smoothly.”

Early days for the data lakehouse

However, Henschen warned that it is still early days for combined lake and warehouse platforms in enterprise environments. “We’re seeing more greenfield deployments and experiments for new use cases rather than straight up replacements of existing data lakes and data warehouses,” he said, adding that DLT has competition from the open source Apache Iceberg project.

“Within the data management and, specifically, the analytical data pipeline arena, another emerging option that’s getting a lot of attention these days is Apache Iceberg. Tabular, a company created by Iceberg’s founders, is working on delivering the same benefits of low-code development and automation,” Henschen said.

[ Learn how IT can harness the power and promise of 5G in this FREE CIO Roadmap Report. Download now! ]

Iceberg got a major endorsement this week, with Google Cloud embracing this open source table format as part of the preview of its new combined data lake and warehouse product, called BigLake.

Databricks claims that DLT is being used by 400 companies globally already, including ADP, Shell, H&R Block, Bread Finance, Jumbo and JLL.

Databricks targets data pipeline automation with Delta Live Tables

Databricks targets data pipeline automation with Delta Live Tables

[ Also on InfoWorld: Applying devops in data science and machine learning ]

Early days for the data lakehouse

Recommend

现在上大学，有个对我很好的亲戚每个月都给我打一千块，收还是不收？

How to Fix the File Size Exceeds Limit Error 0x800700DF in Windows 10

俄媒公布布查事件相关证据「乌克兰区域性防御力量洗劫了一切」，具体情况如何？

品牌电商做秒杀：从预热到活动，5个关键产品设计思路

马斯克称人类所利用的太阳能是微乎其微

PHP (以及 Laravel) 下使用 DynamoDB 的 ORM 工具

如何把贩卖春天、季节限定，打造成品牌低成本传播的「流量密码」？

厦门电信打造全国首个半导体行业5G智能制造工厂

调查显示：超大规模数据中心将迎来建设热潮

NUnit uses open source tools to test .NET code

About Joyk