DATA LAKE REFERENCE ARCHITECTURE

4 min readJul 21, 2018

INTRODUCTION

Data lake is a single platform which is made up of, a combination of data governance, analytics and storage. It’s a secure, durable and centralized cloud-based storage platform that lets you to ingest and store, structured and unstructured data. It also allows us to make necessary transformations on the raw data assets as needed. A comprehensive portfolio of data exploration, reporting, analytics, machine learning, and visualization on the data can be done by utilizing this data lake architecture.

DATA LAKE VS DATA WAREHOUSE

While a data warehouse can also be a large collection of data, it is highly organized and structured. In a data warehouse, data doesn’t arrive in its original form, but is instead transformed and loaded into the organization predefined in the warehouse. This highly structured approach means that a data warehouse is often highly tuned to solve a specific set of problems, but is unusable for others. The structure and organization make it easy to query for specific problems, but practically impossible for others.

A data lake on the other hand, can be applied to a large number and wide variety of problems. Believe it or not, this is because of the lack of structure and organization in a data lake. The lack of a predefined schema gives the data lake more versatility and flexibility. A Data Lake operates, with a more broad and distributed context, where some questions remain ambiguous, with an undefined set of users and a…

DATA LAKE REFERENCE ARCHITECTURE

INTRODUCTION

DATA LAKE VS DATA WAREHOUSE

Written by Kulasangar Gowrisangar

Responses (1)