WHAT IT IS
Data lakes typically run on object storage (Amazon S3, Azure Data Lake Storage, Google Cloud Storage) with open file formats (Parquet, ORC, Avro) and a table format layer (Apache Iceberg, Delta Lake, Hudi) that adds ACID transactions, schema evolution, and time travel. The combination of lake storage plus table format is commonly called a lakehouse.
HOW IT WORKS
Where a data warehouse enforces schema on write, a lake defers schema to read. That flexibility is valuable for ML training data, event streams, clickstream logs, IoT, and audio/video — but without cataloging and lineage a lake can become a 'data swamp' nobody trusts.
WHEN TO USE
Adopt a data lake or lakehouse when workloads are large, semi-structured, or ML-driven; when storage cost in a warehouse is prohibitive; or when open formats and vendor independence are architectural priorities.