Data warehouse: a quick recap

A data warehouse is a central repository for structured business data, separate from your operational systems and designed specifically for reporting and analysis. It pulls data from sources like your CRM, ERP and finance system, transforms it into a consistent structure and gives everyone in the business a single, reliable version of the truth.

Data warehouses use a schema-on-write approach: data is modelled and structured before it lands. Queries are fast, SQL-based and well-suited to the kinds of questions most business teams actually ask – revenue by region, customer retention by cohort, margin by product category. Cloud platforms like Google BigQuery, Snowflake, Amazon Redshift and Azure Synapse Analytics have made the technology accessible to businesses of almost any size.

The trade-off is deliberate constraint. A data warehouse works well for structured data and defined analytical workloads. Raw files, images, unstructured text and machine learning training datasets are not what it was built for.

What is a data lake?

A data lake takes the opposite approach. Rather than transforming data before it's stored, a data lake ingests raw data – structured, semi-structured and unstructured – and leaves the schema to be defined at query time. This is called schema-on-read.

The appeal is flexibility and cost. Object storage – Amazon S3, Google Cloud Storage, Azure Data Lake Storage – is cheap. A data lake can hold enormous volumes of data in any format: CSV files, JSON, application logs, images, audio, sensor readings, clickstream data. You store everything first and figure out how to use it later.

The problem is that "figure it out later" requires serious engineering effort. Without careful governance, data lakes become data swamps: vast stores of raw files with no reliable structure, inconsistent schemas and no clear lineage. Querying them requires significant technical expertise, and the performance on analytical workloads rarely matches a well-built warehouse. For most SMEs, a pure data lake creates more problems than it solves.

What is a data lakehouse?

A data lakehouse attempts to take the best of both architectures. It combines the flexible, low-cost storage of a data lake with the query performance, ACID transactions and schema enforcement of a data warehouse – all in a single platform.

The key enabler is a metadata and transaction layer that sits on top of object storage and adds warehouse-like behaviour. Technologies like Delta Lake (Databricks), Apache Iceberg and Apache Hudi implement this layer. They bring features that object storage alone cannot provide: transactional consistency, time travel (querying historical states of the data), schema evolution and the ability to run SQL queries at scale without sacrificing performance.

AWS Lake Formation provides a managed approach to building lakehouse architectures on S3, with centralised access controls and a data catalogue. Databricks, built around Delta Lake, has become one of the most widely adopted lakehouse platforms for organisations running significant data science and machine learning workloads alongside conventional business intelligence.

The lakehouse is genuinely useful when a business needs to support both SQL-based analytics and machine learning from the same underlying dataset – without maintaining two separate systems. A data scientist training a model and a finance analyst running a revenue report can work from the same store, with appropriate governance applied to each.

How they compare

Data warehouse Data lake Data lakehouse
Data structure Structured, schema-on-write Any format, schema-on-read Any format, schema enforced via metadata layer
Query performance Excellent for SQL analytics Poor without significant tuning Good to excellent with proper configuration
Data types Structured data only Structured, semi-structured, unstructured Structured, semi-structured, unstructured
Storage cost Moderate Low Low to moderate
ML suitability Limited – structured data only High – raw data at scale High – supports both ML and BI
Engineering complexity Moderate High High

Which architecture is right for your business?

The right answer depends on where your business is today – not where it might be in five years.

Under 50 people, or one or two data sources: a data warehouse is the right choice. It is simpler to build, cheaper to run and easier to maintain. The analytical questions most businesses at this stage need to answer are well within what a warehouse handles comfortably. BigQuery or Redshift, designed properly, will serve you well for years.

Mid-market businesses with multiple data sources but primarily structured data: a data warehouse still wins. The jump to a lakehouse is not warranted when the underlying data is structured and the analytical requirements are standard BI – dashboards, reporting, ad hoc SQL queries. The added complexity of a lakehouse does not pay off unless you have the data types and workloads that demand it.

Businesses with machine learning workloads, unstructured data or a dedicated data science team: a lakehouse starts to make sense. If you are training models on application logs, customer documents or image data alongside running conventional analytics, maintaining two separate systems becomes inefficient. A lakehouse lets both workloads coexist on the same underlying store with appropriate governance.

Pure data lake: rarely the right answer for an SME. Without significant data engineering resource and rigorous governance from day one, a data lake degrades into an unusable store of raw files. The engineering overhead is real and ongoing. Very few businesses below enterprise scale have the in-house capability to manage a data lake well.

Common misconceptions

The most common misconception about the lakehouse is that it is straightforward to adopt. It is not. Implementing Delta Lake or Apache Iceberg correctly, configuring the transaction layer, managing schema evolution and building reliable pipelines into object storage requires experienced data engineers. The platforms have matured considerably, but the operational complexity remains high compared to a managed cloud data warehouse.

The opposite misconception is that data warehouses are outdated – that cloud-native lakehouse platforms have made them irrelevant. They have not. For the majority of analytical workloads at most businesses, a well-designed data warehouse on BigQuery or Snowflake delivers better query performance, simpler governance and lower total cost than a lakehouse of equivalent capability. The warehouse is not a legacy architecture; it is the right architecture for a large proportion of businesses.

There is also a tendency to conflate the lakehouse with the data lake. They are different things. The lakehouse specifically addresses the shortcomings of the data lake – the lack of transactional consistency, the query performance problems, the governance challenges. A lakehouse is not just a data lake with better tooling; it is a distinct architectural pattern.

Route B’s view

Most UK SMEs do not need a data lakehouse. What they need is a data warehouse built properly – clean ETL pipelines, a well-designed schema, reliable refresh cycles and dashboards that the business actually trusts. That alone represents a significant improvement over where most growing businesses currently are.

The lakehouse is what comes next, when the warehouse is working and the business has data science or ML workloads that the warehouse cannot support. Reaching for a lakehouse before the warehouse fundamentals are in place is a common mistake – and an expensive one. The engineering overhead is real, and without clean, well-governed data flowing in, the lakehouse delivers no more value than the data lake it replaced.

Build the warehouse first. Get it right. Then, if the business genuinely needs the additional capabilities a lakehouse provides, you will have the clean data and the organisational maturity to build on top of it.

Thinking about your data architecture? Whether you are starting from scratch or looking to improve what you have, we can help you work out the right approach for where your business is now.

Data Warehousing Services