Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, alignment with participant profiles, and success criteria
- High-level migration approaches and risk considerations
- Configuring workspaces, repositories, and lab datasets
Day 1 — Migration Fundamentals and Architecture
- Core Lakehouse concepts, an overview of Delta Lake, and Databricks architecture
- Differences between SMP and MPP architectures and their implications for migration
- Designing the Medallion (Bronze→Silver→Gold) pattern and an overview of Unity Catalog
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure into a notebook
- Mapping temporary tables and cursors to DataFrame transformations
- Validating and comparing results against the original output
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel features
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution
- OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning techniques
Day 2 Lab — Incremental Ingestion & Optimization
- Implementing Auto Loader ingestion and MERGE workflows
- Applying OPTIMIZE, Z-ORDER, and VACUUM; validating outcomes
- Measuring improvements in read/write performance
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, and JSON/array handling
- Interpreting the Spark UI: DAGs, shuffles, stages, tasks, and bottleneck diagnosis
- Query tuning patterns: broadcast joins, hints, caching, and spill reduction
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactoring a complex SQL process into optimized Spark SQL
- Using Spark UI traces to identify and resolve skew and shuffle issues
- Benchmarking before/after results and documenting tuning steps
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
- Transforming loops and cursors into vectorized DataFrame operations
- Modularization, UDFs/pandas UDFs, widgets, and reusable libraries
Day 4 Lab — Refactoring Procedural Scripts
- Refactoring a procedural ETL script into modular PySpark notebooks
- Introducing parametrization, unit-style tests, and reusable functions
- Conducting code reviews and applying best-practice checklists
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling
- Designing incremental Medallion pipelines with quality rules and schema validation
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assembling a Bronze→Silver→Gold pipeline orchestrated with Workflows
- Implementing logging, auditing, retries, and automated validations
- Running the full pipeline, validating outputs, and preparing deployment notes
Operationalization, Governance, and Production Readiness
- Unity Catalog governance, lineage, and access controls best practices
- Cost management, cluster sizing, autoscaling, and job concurrency patterns
- Deployment checklists, rollback strategies, and runbook creation
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned
- Gap analysis, recommended follow-up activities, and training materials handoff
- References, further learning paths, and support options
Requirements
- A solid understanding of data engineering concepts
- Practical experience with SQL and stored procedures (including Synapse or SQL Server)
- Familiarity with ETL orchestration concepts (such as ADF or similar tools)
Target Audience
- Technology managers who possess a data engineering background
- Data engineers looking to transition procedural OLAP logic to Lakehouse patterns
- Platform engineers responsible for overseeing Databricks adoption
35 Hours