Data Engineering on Google Cloud Platform

Duration: 4 Days
Training Fee: HK$26,000
Private in-house training
Apart from public, instructor-led classes, we also offer private in-house trainings for organizations based on their needs. Call us at +852 2116 3328 or email us at [email protected] for more details.
Course Objectives

This course teaches participants the following skills:

  • Design and build data processing systems on Google Cloud Platform
  • Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
  • Derive business insights from extremely large datasets using Google BigQuery
  • Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML
  • Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
  • Enable instant insights from streaming data

To get the most of out of this course, participants should have:

  • Completed Google Cloud Fundamentals: Big Data & Machine Learning course OR have equivalent experience
  • Basic proficiency with common query language such as SQL
  • Experience with data modeling, extract, transform, load activities
  • Developing applications using a common programming language such as Python
  • Familiarity with Machine Learning and/or statistics
Intended Audience

This class is intended for experienced developers who are responsible for managing big data transformations including:

  • Extracting, Loading, Transforming, cleaning, and validating data
  • Designing pipelines and architectures for data processing
  • Creating and maintaining machine learning and statistical models
  • Querying datasets, visualizing query results and creating reports
Delivery Method
  • Instructor-led, instructor-led online
Course Outline

The course includes presentations, demonstrations, and hands-on labs.


Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform
Module 1: Google Cloud Dataproc Overview
Module 2: Running Dataproc Jobs
Module 3: Integrating Dataproc with Google Cloud Platform
Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs


Serverless Data Analysis with Google BigQuery and Cloud Dataflow
Module 5: Serverless data analysis with BigQuery
Module 6: Serverless, autoscaling data pipelines with Dataflow


Serverless Machine Learning with TensorFlow on Google Cloud Platform 
Module 7: Getting started with Machine Learning
Module 8: Building ML models with Tensorflow
Module 9: Scaling ML models with CloudML
Module 10: Feature Engineering


Building Resilient Streaming Systems on Google Cloud Platform
Module 11: Architecture of streaming analytics pipelines
Module 12: Ingesting Variable Volumes
Module 13: Implementing streaming pipelines
Module 14: Streaming analytics and dashboards
Module 15: High throughput and low-latency with Bigtable

Search for a course