8 hours of instruction
Learn how to automate your entire pipeline using an automation tool. In this course students will learn how to programatically author, schedule and monitor their workflows. Students will also learn how to create an environment to containerize, replicate and deploy a pipeline.
OBJECTIVES
- Navigate through the growing landscape of automation tools for DataOps and MLOps
- Acquire foundational knowledge of Airflow components
- Set up a simple Airflow pipeline
- Make distinctions between various Airflow set up options on cloud platforms
- Create and test a data pipeline using Airflow with Docker on a compute instance
PREREQUISITES
Introduction to MLOps Theory
SYLLABUS & TOPICS COVERED
- Exploring Dev/Data/MLOps and Apache Airflow concepts
- Identify steps of the data science life cycle for automation
- Describe DevOps
- DataOps and MLOps
- Name open source data pipeline automation tools
- Describe Apache Airflow and its use cases
- Learning components of Airflow and automating a simple workflow locally
- Explain DAGs and operators in Apache Airflow
- Describe the main components of Apache Airflow
- Setup an environment for Airflow
- Create the first DAG
- Further explore Airflow UI and access the task logs
- Working with different types of Executors
- Run a DAG with SequentialExecutor
- Run a DAG with LocalExecutor
- Run a DAG with CeleryExecutor
- Monitor clusters for CeleryExecutor using Flower
- Running Airflow in Docker
- Describe Docker and its components
- Explore Docker image and container
- Work with docker-compose
SOFTWARE REQUIREMENTS
Apache Airflow, Docker, Terminal, VS Code