Table of Contents
- Executive Summary
- Data Lake and Lakehouse Sector Brief
- Decision Criteria Analysis
- Analyst’s Outlook
- Methodology
- About Andrew Brust
- About GigaOm
- Copyright
1. Executive Summary
Data lakehouses are platforms intended to combine the flexibility of data lakes with the governance, structural optimizations, and query processing technologies of data warehouses. This combination of data lake and data warehouse technologies is intended by its proponents to be the optimal blend for facilitating and managing data analytics.
There are a number of key technologies enabling the data lakehouse paradigm. These include:
- Open table formats such as Apache Iceberg, Delta Lake, and Apache Hudi, as well as the Apache Parquet columnar data file format that usually underlies all three. These formats and their accompanying technology are intended to bring structure to the data lake, aid query performance, and facilitate atomicity, consistency, isolation, and durability (ACID) guarantees.
- An analytics query engine allows analytics to be performed across a broad, distributed variety of data, without the need to apply extensive transformations to that data first.
- Such engines often also possess key query accelerations, including in-memory caching, indexing, and vector processing on central processing units (CPUs).
For organizations that have stretched the limits of the versatility of data warehouses or that have struggled with the performance of first-generation data lakes, the modern data lakehouse provides a solution. It functions as a single platform that can store and manage widely varied types of data, and as one that can enable diverse and powerful analytics over that data.
Business Imperative
An organization’s data provides no value unless that organization has a way to derive meaningful insights from it. To obtain the most value possible from their data, organizations must have control over it, and they need a way to extract meaning from it. This is especially challenging when it comes to big data, which is characterized by larger volumes, increased varieties, higher velocity, and a greater number of sources of data. Data lakehouses evolved to meet these needs. They provide a powerful, reliable, and versatile system from which organizations can manage their data and facilitate analytics across that data to power their operations and strategic decisions.
Sector Adoption Score
To help executives and decision-makers assess the potential impact and value of a data lake and lakehouse solution deployment to the business, this GigaOm Key Criteria report provides a structured assessment of the data lake and lakehouse sector across five factors: benefit, maturity, urgency, impact, and effort. By scoring each factor based on how strongly it compels or deters adoption of data lakes and lakehouses, we provide an overall Sector Adoption Score (Figure 1) for data lakes and lakehouses of 3.8 out of 5, with 5 indicating the strongest possible recommendation to adopt. This score indicates that data lakes and lakehouses are a credible candidate for deployment and worth thoughtful consideration.
The factors contributing to the Sector Adoption Score for data lakes and lakehouses are explained in more detail in the Sector Brief section that follows.
Key Criteria for Evaluating Data Lake and Lakehouse Solutions
Sector Adoption Score
Figure 1. Sector Adoption Score for Data Lakes and Lakehouses
This is the second year that GigaOm has reported on the data lake and lakehouse space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.
This GigaOm Key Criteria report highlights the capabilities (table stakes, key features, and emerging features) and non-functional requirements (business criteria) for selecting an effective data lake or lakehouse solution. The companion GigaOm Radar report identifies vendors and products that excel in those decision criteria. Together, these reports provide an overview of the category and its underlying technology, identify leading data lake and lakehouse offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.
GIGAOM KEY CRITERIA AND RADAR REPORTS
The GigaOm Key Criteria report provides a detailed decision framework for IT and executive leadership assessing enterprise technologies. Each report defines relevant functional and non-functional aspects of solutions in a sector. The Key Criteria report informs the GigaOm Radar report, which provides a forward-looking assessment of vendor solutions in the sector.