Module 3

Data Analytics Essentials Certification Preparatory Course

Duration: 12 hours


This qualification is intended for individual who aspires to become a Citizen Data Scientist in the organization using open-source technology to perform moderate to sophisticated diagnostics analytics and simple predictive analytics. The Data Analytics Essentials for Citizen Data Scientist qualification is also highly relevant to other key staff involved in the requirements input, design, development, delivery and ultimate use of the digital initiatives including Data consumer, digital initiatives decision maker, business analyst, and operational line managers/staff.

For private classes, please contact us at (852) 2116 3328 for more details.

View Schedule

Certified Skills

  • Create data models for analytics function on multiple data sources
  • Prepping the data with SQL Lab and perform diagnostics analytics with Superset
  • Select an appropriate machine learning algorithm at hand with MLlib & PySpark API

Intended Audience

Anyone who want to:

  • Prove their ability to perform self-service diagnostic analytics for insights
  • Display their value to use low-cost and high-return open-source technology to improve daily performance
  • Show their inclination to work productively with your colleague with data and analytics


  • Completion of CDPOS Module 2
  • Basic computer software skill
  • Basic internet skill


50 Multiple Choices | 75 minutes (Module 3)


Syllabus Highlights

Analytics Process

  • The analytics process of diagnostic and predictive analytics
  • Data prep tasks- data collection, data cleansing, data munging and data visualisation
  • Build analytics model - convert unstructured data into quantified metrics

Diagnostic Analytics Essentials

  • Diagnostic analytics objectives, processes, data prepping and
  • Data modelling for diagnostic analytics with Hive, Spark SQL and PySpark
  • Data visualisation for diagnostic analytics - Apache Superset

Predictive Analytics Essentials

  • Predictive analytics objectives, best practices, processes, data prepping and model building using Hive, Spark SQL and PySpark
  • Recognise and select an appropriate machine learning algorithm at hand
  • Predictive modelling - decision-tree, clustering with Python and Spark MLlib


Mr. Patrick Tsoi

  • Doctor of Education (in progress), Hong Kong Baptist University
  • Master in IT in Education, University of Hong Kong
  • Bachelor of Engineering in System Engineering and Engineering Management, Chinese University of Hong Kong
  • Over 20+ years in the IT training field, and work includes complex projects applying data science, and software development in Finance, Data Science and Quantitative Analysis