Data Engineering with Databricks

Course ID: DTB-DED
Duration: 2 Days
Private in-house training

Apart from public, instructor-led classes, we also offer private in-house trainings for organizations based on their needs. Call us at +852 2116 3328 or email us at [email protected] for more details.

What are the skills covered
  • Data Ingestion with Lakeflow Connect
  • Deploy Workloads with Lakeflow Jobs
  • Build Data Pipelines with Lakeflow Spark Declarative Pipelines
  • Data Management and Governance with Unity Catalog
Who should attend this course
  • Everyone who is interested
Course Modules

Module 1: Data Ingestion with Lakeflow Connect

  • Introduction to Data Engineering in Databricks
  • Cloud Storage Ingestion with LakeFlow Connect Standard Connector
  • Enterprise Data Ingestion with LakeFlow Connect Managed Connectors
  • Ingestion Alternatives

 

Module 2: Deploy Workloads with Lakeflow Jobs

  • Introduction to Data Engineering in Databricks
  • Lakeflow Jobs Core Concepts
  • Creating and Scheduling Jobs
  • Advance Lakeflow Jobs Features

 

Module 3: Build Data Pipelines with Lakeflow Spark Declarative Pipelines

  • Introduction to Data Engineering in Databricks
  • Lakeflow Spark Declarative Pipeline Fundamentals
  • Building Lakeflow Spark Declarative Pipelines

 

Module 4: Data Management and Governance with Unity Catalog

  • Data Governance Overview
  • Demo: Populating the Metastore
  • Lab: Navigating the Metastore
  • Organization and Access Patterns
  • Demo: Upgrading Tables to Unity Catalog
  • Security and Administration in Unity Catalog
  • Databricks Marketplace Overview
  • Privileges in Unity Catalog
  • Demo: Controlling Access to Data
  • Fine-Grained Access Control
  • Lab: Migrating and Managing Data in Unity Catalog
Prerequisites

1. Data Ingestion with Lakeflow Connect

  • Basic understanding of the Databricks Data Intelligence platform, including Databricks Workspaces, Apache Spark, Delta Lake, the Medallion Architecture and Unity Catalog.
  • Experience working with various file formats (e.g., Parquet, CSV, JSON, TXT).
  • Proficiency in SQL and Python.
  • Familiarity with running code in Databricks notebooks.

2. Deploy Workloads with Lakeflow Jobs

  • Beginner familiarity with basic cloud concepts (virtual machines, object storage, identity management)
  • Ability to perform basic code development tasks (create compute, run code in notebooks, use basic notebook operations, import repos from git, etc.)
  • Intermediate familiarity with basic SQL concepts (CREATE, SELECT, INSERT, UPDATE, DELETE, WHILE, GROUP BY, JOIN, etc.)

3. Build Data Pipelines with Lakeflow Spark Declarative Pipelines

  • Basic understanding of the Databricks Data Intelligence platform, including Databricks Workspaces, Apache Spark, Delta Lake, the Medallion Architecture and Unity Catalog.
  • Experience ingesting raw data into Delta tables, including using the read_files SQL function to load formats like CSV, JSON, TXT, and Parquet.
  • Proficiency in transforming data using SQL, including writing intermediate-level queries and a basic understanding of SQL joins.

4. Data Management and Governance with Unity Catalog

  • Beginner familiarity with cloud computing concepts (virtual machines, object storage, etc.)
  • Intermediate experience with basic SQL concepts such as SQL commands, aggregate functions, filters and sorting, indexes, tables, and views.
  • Basic knowledge of Python programming, jupyter notebook interface, and PySpark fundamentals.

Search for a course