8 hours of instruction
Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.
OBJECTIVES
- How data moves within a computer
- How to build the right balance between CPU, DRAM, Disk Memory, and GPUs
- How different file formats can be read and manipulated by hardware
- How to scale an ETL pipeline with multiple GPUs using NVTabular
- How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second
PREREQUISITES
None
SYLLABUS & TOPICS COVERED
- Introduction
- Meet the instructor
- Create an account at courses(dot)nvidia(dot)com/join
- Data On The Hardware Level
- Pandas and CuDF
- Dask
- ETL With NV Tabular
- Transform raw json into analysis-ready parquet files
- Learn how to quickly add features to a dataset, such as Categorify and Lambda operators
- Data Visualization
- Learn how to use descriptive statistics and plots like histograms in order to assess data quality
- Learn effective memory usage, so users can quickly filter data through a graphical interface
- Final Project
- Review a dashboard
- Apply the techniques learned in class to find and eliminate efficiencies in the backend code
- Final Review
- Review key learnings and answer questions
- Complete the assessment, earn your certificate and complete the workshop survey
- Learn how to set up your own AI application development environment
SOFTWARE REQUIREMENTS
Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.
About Instructor
Login
Accessing this course requires a login. Please enter your credentials below!