Apache Spark Programming with Databricks

Course ID: DTB-ASPD
Duration: 2 Days
Private in-house training

Apart from public, instructor-led classes, we also offer private in-house trainings for organizations based on their needs. Call us at +852 2116 3328 or email us at [email protected] for more details.

What are the skills covered
  • Introduction to Apache Spark
  • Developing Applications with Apache Spark
  • Stream Processing and Analysis with Apache Spark
  • Monitoring and Optimizing Apache Spark Workloads on Databricks
Who should attend this course
  • Everyone who is interested
Course Modules

Module 1: Introduction to Apache Spark

  • Spark Runtime Architecture
  • Exploring Apache Spark Architecture in Databbricks
  • Introduction to Spark DataFrames and SQL
  • Reading and Writing Data with DataFrames
  • Distributed System Programming Fundamentals
  • Basic ETL with the DataFrame API
  • Flight Data ETL with the DataFrame API
  • Analyzing Transaction Data with DataFrames

 

Module 2: Developing Applications with Apache Spark

  • DataFrame API Basics
  • Demo: (Optional) Basic ETL with the DataFrame API
  • Grouping and Aggregating Data
  • Demo: Grouping and Aggregating Data
  • Lab: Grouping and Aggregating E-Commerce Data
  • Relational Operations
  • Demo: Data Relational Operations in Apache Spark
  • Working with Complex Data
  • Demo: Working with Complex Data Types in Apache Spark
  • Lab: Working with Complex Data Types in E-Commerce Data

 

Module 3: Stream Processing and Analysis with Apache Spark

  • Introduction to Stream Processing
  • Spark Structured Streaming
  • Demo: Introduction to Spark Structured Streaming
  • Lab: Introduction to Spark Structured Streaming
  • Advanced Stream Processing and Analysis
  • Demo: Window Aggregation in Spark Structured Streaming
  • Lab: Window Aggregation in Spark Structured Streaming

 

Module 4: Monitoring and Optimizing Apache Spark Workloads on Databricks

  • Apache Spark and Databricks
  • Using Apache Spark with Delta Lake
  • Demo: Introduction to Delta Lake
  • Lab: Introduction to Delta Lake
  • Optimizing Apache Spark
  • Demo: Optimizing Apache Spark
  • Lab: Optimizing Apache Spark
Prerequisites
  • Basic programming knowledge
  • Familiarity with Python
  • Basic understanding of SQL queries (SELECT, JOIN, GROUP BY)
  • Familiarity with data processing concepts
  • No prior Spark or Databricks experience required

Search for a course