Spark Partitioning & Optimization

A 6-hour course for intermediate-level data scientists / engineers that covers spark partitions, benchmarking, performance optimization and monitoring…

Spark Data Structures & Parallelism

A 4-hour course for intermediate-level data scientists / engineers that covers Spark architecture and fundamentals including RDDs, DataFrames, Dataset…

Introduction to Scala Collections

A 4-hour course for intermediate-level data scientists / engineers that covers the key elements and different types of Scala collections.

Building Scalable Models in PySpark

Learn how to optimize your code and to speed up current data processing using PySpark. In this course, students will work through best practices of…

Distributed Data Storage (Hadoop)

A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be a…

Foundations of Big Data

A theoretical course covering topics on how to handle data at scale and the different tools needed for distributed data storage, analysis, and mana…