Courses

  • 0 Lessons

    Big Data Orchestration & Workflow Management

    A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem.

  • 0 Lessons

    Building Scalable Models in PySpark

    Learn how to optimize your code and to speed up current data processing using PySpark. In this course, students will work through best practices of how and when to use PySpark. They will explore what they can do with PySpark and how to use distributed computing within PySpark.

  • 0 Lessons

    Distributed Data Storage (Hadoop)

    A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.

  • 0 Lessons

    Foundations of Big Data

    A theoretical course covering topics on how to handle data at scale and the different tools needed for distributed data storage, analysis, and management. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of distributed computing.

  • 0 Lessons

    Introduction to Scala Collections

    A 4-hour course for intermediate-level data scientists / engineers that covers the key elements and different types of Scala collections.
  • 0 Lessons

    Spark Data Structures & Parallelism

    A 4-hour course for intermediate-level data scientists / engineers that covers Spark architecture and fundamentals including RDDs, DataFrames, Datasets.
  • 0 Lessons

    Spark Partitioning & Optimization

    A 6-hour course for intermediate-level data scientists / engineers that covers spark partitions, benchmarking, performance optimization and monitoring.