6 hours of instruction
A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.
OBJECTIVES
- Understand the need for distributed data storage
- Overview of components of Hadoop architecture
- Learn to deploy a Hadoop application on a cloud service
- Explore cross-functional tools used in conjunction with Hadoop
PREREQUISITES
Foundations of Big Data
SYLLABUS & TOPICS COVERED
- Apache Hadoop
- Overview of Hadoop’s main layers
- Introduce HDFS(Hadoop Distributed File System)
- Discuss YARN and MapReduce
- Intro To HDFS
- Overview and architecture of HDFS
- Deploy a Hadoop application on a cluster
- Other Tools
- Hadoop alternatives and other cross-functional tools used with Hadoop
SOFTWARE REQUIREMENTS
AWS EC2, Hadoop, Java, Python, You will have access to a Python-based JupyterHub environment for this course. No additional download or installation is required.