Courses

0 Lessons

Big Data Orchestration & Workflow Management

OpenTeams

A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem.

Not Enrolled

0 Lessons

Building Scalable Models in PySpark

OpenTeams

Learn how to optimize your code and to speed up current data processing using PySpark. In this course, students will work through best practices of how and when to use PySpark. They will explore what they can do with PySpark and how to use distributed computing within PySpark.

Not Enrolled

0 Lessons

Distributed Data Storage (Hadoop)

OpenTeams

A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.

Not Enrolled

0 Lessons

Foundations of Big Data

OpenTeams

A theoretical course covering topics on how to handle data at scale and the different tools needed for distributed data storage, analysis, and management. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of distributed computing.

Not Enrolled

0 Lessons

Introduction to Scala Collections

DataSociety

A 4-hour course for intermediate-level data scientists / engineers that covers the key elements and different types of Scala collections.

Not Enrolled

0 Lessons

Spark Data Structures & Parallelism

DataSociety

A 4-hour course for intermediate-level data scientists / engineers that covers Spark architecture and fundamentals including RDDs, DataFrames, Datasets.

Not Enrolled

0 Lessons

Spark Partitioning & Optimization

DataSociety

A 6-hour course for intermediate-level data scientists / engineers that covers spark partitions, benchmarking, performance optimization and monitoring.

Big Data Orchestration & Workflow Management

Building Scalable Models in PySpark

Distributed Data Storage (Hadoop)

Foundations of Big Data

Introduction to Scala Collections

Spark Data Structures & Parallelism

Spark Partitioning & Optimization

Committed to your success with open source. OpenTeams is your easy point of access to a range of services from our open source expert network, from commercial open source support to open source training, staffing & recruiting services, and more.

Resources

OpenTeams