Courses Archive - KORNERSTONE

DP-3011

DP-3011: Implement a Data Analytics Solution with Azure Databricks

This course explores how to use Databricks and Apache Spark on Azure to take data projects from exploration to production.

Learn how to ingest, transform, and analyze large-scale datasets with Spark DataFrames, Spark SQL, and PySpark
Build confidence in managing distributed data processing
Get hands-on with the Databricks workspace—navigating clusters and creating and optimizing Delta tables
Dive into data engineering practices, including designing ETL pipelines, handling schema evolution, and enforcing data quality.
Automate and manage workloads with Lakeflow Jobs and pipelines
Explore governance and security capabilities such as Unity Catalog and Purview integration

1 Day

HK$3500

DP-3028

DP-3028: Implement Generative AI engineering with Azure Databricks

This course covers generative AI engineering on Azure Databricks, using Spark to explore, fine-tune, evaluate, and integrate advanced language models. It teaches how to implement techniques like retrieval-augmented generation (RAG) and multi-stage reasoning, as well as how to fine-tune large language models for specific tasks and evaluate their performance.

Students will also learn about responsible AI practices for deploying AI solutions and how to manage models in production using LLMOps (Large Language Model Operations) on Azure Databricks.

1 Day

HK$3500

DP-3027

DP-3027: Implement a data engineering solution with Azure Databricks

Learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

1 Day

HK$3500

DTB-GAED

Generative AI Engineering with Databricks

This course is aimed at data scientists, machine learning engineers, and other data practitioners who want to build generative AI applications using the latest and most popular frameworks and Databricks capabilities.

Below, we describe each of the four, four-hour modules included in this course.

Generative AI Solution Development: This is your introduction to contextual generative AI solutions using the retrieval-augmented generation (RAG) method. First, you’ll be introduced to RAG architecture and the significance of contextual information using Mosaic AI Playground. Next, we’ll show you how to prepare data for generative AI solutions and connect this process with building a RAG architecture. Finally, you’ll explore concepts related to context embedding, vectors, vector databases, and the utilization of Mosaic AI Vector Search.

Generative AI Application Development: Ready for information and practical experience in building advanced LLM applications using multi-stage reasoning LLM chains and agents? In this module, you’ll first learn how to decompose a problem into its components and select the most suitable model for each step to enhance business use cases. Following this, we’ll show you how to construct a multi-stage reasoning chain utilizing LangChain and HuggingFace transformers. Finally, you’ll be introduced to agents and will design an autonomous agent using generative models on Databricks.

Generative AI Application Evaluation and Governance: This is your introduction to evaluating and governing generative AI systems. First, you’ll explore the meaning behind and motivation for building evaluation and governance/security systems. Next, we’ll connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, we’ll teach you about a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost.

Generative AI Application Deployment and Monitoring: Ready to learn how to deploy, operationalize, and monitor generative deploying, operationalizing, and monitoring generative AI applications? This module will help you gain skills in the deployment of generative AI applications using tools like Model Serving. We’ll also cover how to operationalize generative AI applications following best practices and recommended architectures. Finally, we’ll discuss the idea of monitoring generative AI applications and their components using Lakehouse Monitoring.

2 Days

DTB-MLD

Machine Learning with Databricks

Welcome to Machine Learning with Databricks!
This course is your gateway to mastering machine learning workflows on Databricks. Dive into data preparation, model development, deployment, and operations, guided by expert instructors. Learn essential skills for data exploration, model training, and deployment strategies tailored for Databricks. By course end, you’ll have the knowledge and confidence to navigate the entire machine learning lifecycle on the Databricks platform, empowering you to build and deploy robust machine learning solutions efficiently.

2 Days

DTB-ADED

Advanced Data Engineering with Databricks

This course serves as an appropriate entry point to learn Advanced Data Engineering with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

Databricks Streaming and Lakeflow Spark Declarative Pipelines

This course provides a comprehensive understanding of Spark Structured Streaming and Delta Lake, including computation models, configuration for streaming read, and maintaining data quality in a streaming environment.

Databricks Data Privacy

This content is intended for the learner persona of data engineers or for customers, partners, and employees who complete data engineering tasks with Databricks. It aims to provide them with the necessary knowledge and skills to execute these activities effectively on the Databricks platform.

Databricks Performance Optimization

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more.

Automated Deployment with Databricks Asset Bundles

This course provides a comprehensive review of DevOps principles and their application to Databricks projects. It begins with an overview of core DevOps, DataOps, continuous integration (CI), continuous deployment (CD), and testing, and explores how these principles can be applied to data engineering pipelines.

The course then focuses on continuous deployment within the CI/CD process, examining tools like the Databricks REST API, SDK, and CLI for project deployment. You will learn about Databricks Asset Bundles (DABs) and how they fit into the CI/CD process. You’ll dive into their key components, folder structure, and how they streamline deployment across various target environments in Databricks. You will also learn how to add variables, modify, validate, deploy, and execute Databricks Asset Bundles for multiple environments with different configurations using the Databricks CLI.

Finally, the course introduces Visual Studio Code as an Interactive Development Environment (IDE) for building, testing, and deploying Databricks Asset Bundles locally, optimizing your development process. The course concludes with an introduction to automating deployment pipelines using GitHub Actions to enhance the CI/CD workflow with Databricks Asset Bundles.

By the end of this course, you will be equipped to automate Databricks project deployments with Databricks Asset Bundles, improving efficiency through DevOps practices.

2 Days

DTB-DED

Data Engineering with Databricks

This is an introductory course that serves as an appropriate entry point to learn Data Engineering with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

1. Data Ingestion with Lakeflow Connect

This course provides a comprehensive introduction to Lakeflow Connect as a scalable and simplified solution for ingesting data into Databricks from a variety of data sources. You will begin by exploring the different types of connectors within Lakeflow Connect (Standard and Managed), learn about various ingestion techniques, including batch, incremental batch, and streaming, and then review the key benefits of Delta tables and the Medallion architecture.

From there, you will gain practical skills to efficiently ingest data from cloud object storage using Lakeflow Connect Standard Connectors with methods such as CREATE TABLE AS (CTAS), COPY INTO, and Auto Loader, along with the benefits and considerations of each approach. You will then learn how to append metadata columns to your bronze level tables during ingestion into the Databricks data intelligence platform. This is followed by working with the rescued data column, which handles records that don’t match the schema of your bronze table, including strategies for managing this rescued data.

The course also introduces techniques for ingesting and flattening semi-structured JSON data, as well as enterprise-grade data ingestion using Lakeflow Connect Managed Connectors.

Finally, learners will explore alternative ingestion strategies, including MERGE INTO operations and leveraging the Databricks Marketplace, equipping you with foundational knowledge to support modern data engineering ingestion.

2. Deploy Workloads with Lakeflow Jobs

Deploy Workloads with Lakeflow Jobs course teaches how to orchestrate and automate data, analytics, and AI workflows using Lakeflow Jobs. You will learn to make robust, production-ready pipelines with flexible scheduling, advanced orchestration, and best practices for reliability and efficiency-all natively integrated within the Databricks Data intelligence Platform. Prior experience with Databricks, Python and SQL is recommended.

3. Build Data Pipelines with Lakeflow Spark Declarative Pipelines

This course introduces users to the essential concepts and skills needed to build data pipelines using Lakeflow Spark Declarative Pipelines (SDP) in Databricks for incremental batch or streaming ingestion and processing through multiple streaming tables and materialized views. Designed for data engineers new to Spark Declarative Pipelines, the course provides a comprehensive overview of core components such as incremental data processing, streaming tables, materialized views, and temporary views, highlighting their specific purposes and differences.

Topics covered include:

– Developing and debugging ETL pipelines with the multi-file editor in Spark Declarative Pipelines using SQL (with Python code examples provided)

– How Spark Declarative Pipelines track data dependencies in a pipeline through the pipeline graph

– Configuring pipeline compute resources, data assets, trigger modes, and other advanced options

Next, the course introduces data quality expectations in Spark Declarative Pipelines, guiding users through the process of integrating expectations into pipelines to validate and enforce data integrity. Learners will then explore how to put a pipeline into production, including scheduling options, and enabling pipeline event logging to monitor pipeline performance and health.

Finally, the course covers how to implement Change Data Capture (CDC) using the AUTO CDC INTO syntax within Spark Declarative Pipelines to manage slowly changing dimensions (SCD Type 1 and Type 2), preparing users to integrate CDC into their own pipelines.

4. Data Management and Governance with Unity Catalog

In this course, you’ll learn about data management and governance using Databricks Unity Catalog. It covers foundational concepts of data governance, complexities in managing data lakes, Unity Catalog’s architecture, security, administration, and advanced topics like fine-grained access control, data segregation, and privilege management.

* This course seeks to prepare students to complete the Associate Data Engineering certification exam, and provides the requisite knowledge to take the course Advanced Data Engineering with Databricks.

2 Days

DTB-DAD

Data Analysis with Databricks

This course provides a comprehensive introduction to Databricks SQL. Learners will ingest data, write queries, produce visualizations and dashboards, and configure alerts. This course will prepare you to take the Databricks Certified Data Analyst Associate exam.

This course consists of two four-hour modules.

SQL Analytics on Databricks

In this course, you’ll learn how to effectively use Databricks for data analytics, with a specific focus on Databricks SQL. As a Databricks Data Analyst, your responsibilities will include finding relevant data, analyzing it for potential applications, and transforming it into formats that provide valuable business insights.

You will also understand your role in managing data objects and how to manipulate them within the Databricks Data Intelligence Platform, using tools such as Notebooks, the SQL Editor, and Databricks SQL.

Additionally, you will learn about the importance of Unity Catalog in managing data assets and the overall platform. Finally, the course will provide an overview of how Databricks facilitates performance optimization and teach you how to access Query Insights to understand the processes occurring behind the scenes when executing SQL analytics on Databricks.

AI/BI for Data Analysts

In this course, you’ll learn how to use the features Databricks provides for business intelligence needs: AI/BI Dashboards and AI/BI Genie. As a Databricks Data Analyst, you will be tasked with creating AI/BI Dashboards and AI/BI Genie Spaces within the platform, managing the access to these assets by stakeholders and necessary parties, and maintaining these assets as they are edited, refreshed, or decommissioned over the course of their lifespan. This course intends to instruct participants on how to design dashboards for business insights, share those with collaborators and stakeholders, and maintain those assets within the platform. Participants will also learn how to utilize AI/BI Genie Spaces to support self-service analytics through the creation and maintenance of these environments powered by the Databricks Data Intelligence Engine.

1 Day

DTB-ASPD

Apache Spark Programming with Databricks

This course serves as an appropriate entry point to learn Apache Spark Programming with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

Introduction to Apache Spark

This course offers essential knowledge of Apache Spark, with a focus on its distributed architecture and practical applications for large-scale data processing. Participants will explore programming frameworks, learn the Spark DataFrame API, and develop skills for reading, writing, and transforming data using Python-based Spark workflows.

Developing Applications with Apache Spark

Master scalable data processing with Apache Spark in this hands-on course. Learn to build efficient ETL pipelines, perform advanced analytics, and optimize distributed data transformations using Spark’s DataFrame API. Explore grouping, aggregation, joins, set operations, and window functions. Work with complex data types like arrays, maps, and structs while applying best practices for performance optimization.

Stream Processing and Analysis with Apache Spark

Learn the essentials of stream processing and analysis with Apache Spark in this course. Gain a solid understanding of stream processing fundamentals and develop applications using the Spark Structured Streaming API. Explore advanced techniques such as stream aggregation and window analysis to process real-time data efficiently. This course equips you with the skills to create scalable and fault-tolerant streaming applications for dynamic data environments.

Monitoring and Optimizing Apache Spark Workloads on Databricks

This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You’ll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.

2 Days

PMI Construction Professional (PMI-CP)™

The PMI Construction Professional (PMI-CP) certification is the only internationally recognized credential offering an in-depth curriculum tailored to the construction industry. Designed for professionals with at least 3 years of experience in construction or built environment projects, PMI-CP equips you with the skills to lead, plan, and manage contracts while navigating the complexities of modern construction projects.

Cortex XSIAM

Palo Alto – Security Operations, Integration, and Automation

XSIAM is the industry’s most comprehensive security incident and asset management platform, offering extensive coverage for securing and managing infrastructure, workloads, and applications across multiple environments.

The course is designed to enable cybersecurity professionals, particularly those in SOC/CERT/CSIRT and engineering roles, to use XSIAM. The course reviews XSIAM intricacies, from fundamental components to advanced strategies and techniques, including skills needed to configure security integrations, develop automation workflows, manage indicators, and optimize dashboards for enhanced security operations.

3 Days

EDU-210

Palo Alto – Firewall 11.1 Essentials: Configuration and Management

Successful completion of this five-day, instructor-led course should enhance the student’s understanding of configuring and managing Palo Alto Networks Next-Generation Firewalls. The course includes hands-on experience configuring, managing, and monitoring a firewall in a lab environment.

5 Days

Search

Search for a course