Azure Synapse data lakehouse
Customer presentation
Intro
| 2
OQuila helps organisations to transform to
a data-driven organisation.
About OQuila
Data & Analytics, Internet of Things and Application
Innovation solutions
Joining forces with established IT company
Innovation & transformation with trusted technologies
Evolution of data platforms
© OQuila 2021 | 5
Data Lake vs Data Warehouse
© OQuila 2021 | 6
Data Lake
Schema on read; answers also the
questions of tomorrow
Scales without limits
Can hold any type of data
Data Warehouse
Schema on write; answers the
questions of today
Mainly for relational data (tables
and rows)
Can be part of an Enterprise data
lake or lakehouse
Overview
| 7
General principles OQuila Achitecture
| 8© OQuila 2021
1
2
3
4
5
6
Use of standard components
100% Cloud Services: PaaS or
SaaS. No installations or Virtual
Machines
No custom development
Use of components within the
same ecosystem: e.g. Microsoft
Azure Synapse
Minimize maintenance by using
Services (maintained by
Microsoft)
Dynamic and scalable
Agile Data Model
No traditional schema or fixed model
RAW, STAGED, CURATED:
No rework when adding additional sources
RAW and CURATED stores data separately
Preparations/calculations are done in STAGED environnment and are reusable
Supports changes to business rules with ease
Schema on read; answers also the questions of tomorrow
© OQuila 2021 | 9
© OQuila 2021 10
Data Sources
Azure Synapse Analytics
RAW STAGE
CURATED
Data Lake
Gen 2
Cleansing and Transformations via Spark clusters
Synapse Pipelines
On demand
SQL pool
Power BI
Synapse Data Flow: Monitoring Quality of Data
Validated
Anomaly
Excel
Power Apps
Automation
Flows
Azure Machine Learning
Synapse components
Data pipelines:
A lot of standard connectors (SQL, Oracle, CSV, API, …)
Data extraction from online and on-prem systems
Add new systems easily
Data Lake:
RAW, STAGE and CURATED folders (level maturity en correctness data)
Parquet files to be able to work efficiently with large amounts of data
Spark Cluster:
Performant transformation and cleansing actions via notebooks
Transfers “edited” data to the next stage (RAW, STAGE, CURATED)
Synapse Data flows:
Definition business rules via graphical designer (missing values, inconsistencies, …)
Puts anomalies in a separate STAGE environment
© OQuila 2021 | 11
Synapse components
On demand SQL Pool:
Build in in Azure Synapse
Links directly to Parquet files in CURATED zone (without having to copy data to tables).
Row level security
Allows to access data via:
Queries
Power BI
Excel
Automation tools
© OQuila 2021 | 12
Synapse Data Flow
© OQuila 2021 13
Our PoV/PoC approach
| 14
Dream Big, Start Small, Grow Fast
Synapse based Data
Platform
Proof of Value
Rollout 2
Rollout 3
Rollout 4
...
Proof of Concept Project approach
Make smart choices about the scope
Define the ‘low hanging fruit’ data sources eligible for the PoC
Define a quick-win report
Define a lean & mean project team
After kick-off OQuila will
Set-up the Azure environment
Set-up the OQuila’s Synapse Data lakehouse framework
Set-up and deploy the selected data pipeline(s)
Build the report
Document the solution
Present the solution
Ready for use and grow!
© OQuila 2021 16
Thank you !