GLOSSARY

Data Architecture

Data architecture is the models, standards, and integration patterns governing how an organization stores, moves, and secures its data end-to-end.

Last updated:

Quick answer
Data architecture is the design of how data moves, is stored, and is used across an organization. It specifies source systems, ingestion patterns, storage layers (warehouse, lake, lakehouse), transformation frameworks, consumption tools, and the governance around them. Good architecture is workload-first — shaped by the analytical questions the business actually needs answered — not trend-first.

WHAT IT IS

Modern data architecture is usually a hybrid of three patterns: a data warehouse for governed analytics, a data lake or lakehouse for raw and semi-structured data, and operational data stores for real-time workloads. Inmon (top-down, 3NF) and Kimball (dimensional, bottom-up) remain the two canonical modeling traditions; Data Vault 2.0 is common in regulated, fast-changing enterprises.

HOW IT WORKS

A data architecture document specifies source inventory, conceptual and logical models, canonical entity definitions, integration patterns (batch, streaming, CDC), data quality and observability, access and privacy controls, and change management. It is the blueprint every downstream team (BI, ML, product) builds against.

WHEN TO USE

Commission data architecture work when systems are proliferating, when every new use case requires a custom integration, or when regulators demand a defensible data-lineage story.

RELATED

SOURCES

Related questions.

What is data architecture?
Data architecture is the design of how data moves, is stored, and is used across an organization. It specifies source systems, ingestion patterns, storage layers (warehouse, lake, lakehouse), transformation frameworks, consumption tools, and the governance and security that surround all of it.
What is the difference between a warehouse, a lake, and a lakehouse?
A data warehouse stores structured, modeled data tuned for analytical queries (Snowflake, BigQuery, Redshift). A data lake stores raw files in open formats (S3, ADLS). A lakehouse combines lake storage with warehouse-grade query and governance (Databricks, Snowflake with Iceberg, BigLake).
What is the modern data stack?
A common pattern: cloud storage (S3/ADLS/GCS) + cloud warehouse (Snowflake/BigQuery/Redshift/Databricks) + ELT ingestion (Fivetran/Airbyte/Stitch) + transformation in-warehouse (dbt) + BI (Tableau/Power BI/Looker) + reverse-ETL (Hightouch, Census) for activation. Orchestration via Airflow, Dagster, or Prefect.
How do you choose the right architecture?
By the workloads, not the trend. Start with the analytical questions the business actually needs to answer, the rate of new data, compliance constraints, and the team's operational capacity. Architecture decisions are rarely reversed cheaply, so over-simplify early and add complexity only when a workload demands it.
How does NUUN Digital design data architectures?
We run a workload-first architecture review, document the target-state diagram against named decisions, and sequence migrations so the business sees value each quarter — not after a two-year replatform. Vendor choice follows workload, not the other way around.

Need this term in action?