Accelerating Data Engineering Pipelines

8 hours of instruction

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

OBJECTIVES

How data moves within a computer
How to build the right balance between CPU, DRAM, Disk Memory, and GPUs
How different file formats can be read and manipulated by hardware
How to scale an ETL pipeline with multiple GPUs using NVTabular
How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

Introduction
- Meet the instructor
- Create an account at courses(dot)nvidia(dot)com/join
Data On The Hardware Level
- Pandas and CuDF
- Dask
ETL With NV Tabular
- Transform raw json into analysis-ready parquet files
- Learn how to quickly add features to a dataset, such as Categorify and Lambda operators
Data Visualization
- Learn how to use descriptive statistics and plots like histograms in order to assess data quality
- Learn effective memory usage, so users can quickly filter data through a graphical interface
Final Project
- Review a dashboard
- Apply the techniques learned in class to find and eliminate efficiencies in the backend code
Final Review
- Review key learnings and answer questions
- Complete the assessment, earn your certificate and complete the workshop survey
- Learn how to set up your own AI application development environment

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

148 Courses

Committed to your success with open source. OpenTeams is your easy point of access to a range of services from our open source expert network, from commercial open source support to open source training, staffing & recruiting services, and more.