Distributed Data Storage (Hadoop)

A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.

6 hours of instruction

A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.

OBJECTIVES

  1. Understand the need for distributed data storage
  2. Overview of components of Hadoop architecture
  3. Learn to deploy a Hadoop application on a cloud service
  4. Explore cross-functional tools used in conjunction with Hadoop

PREREQUISITES

Foundations of Big Data

SYLLABUS & TOPICS COVERED

  1. Apache Hadoop
    • Overview of Hadoop’s main layers
    • Introduce HDFS(Hadoop Distributed File System)
    • Discuss YARN and MapReduce
  2. Intro To HDFS
    • Overview and architecture of HDFS
    • Deploy a Hadoop application on a cluster
  3. Other Tools
    • Hadoop alternatives and other cross-functional tools used with Hadoop

SOFTWARE REQUIREMENTS

AWS EC2, Hadoop, Java, Python, You will have access to a Python-based JupyterHub environment for this course. No additional download or installation is required.

About Instructor

OpenTeams

56 Courses

Not Enrolled
This course is currently closed