Synapse components
• Data pipelines:
• A lot of standard connectors (SQL, Oracle, CSV, API, …)
• Data extraction from online and on-prem systems
• Add new systems easily
• Data Lake:
• RAW, STAGE and CURATED folders (level maturity en correctness data)
• Parquet files to be able to work efficiently with large amounts of data
• Spark Cluster:
• Performant transformation and cleansing actions via notebooks
• Transfers “edited” data to the next stage (RAW, STAGE, CURATED)
• Synapse Data flows:
• Definition business rules via graphical designer (missing values, inconsistencies, …)
• Puts anomalies in a separate STAGE environment
© OQuila 2021 | 11