Accelerating Data Engineering Pipelines

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

8 hours of instruction

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

OBJECTIVES

  1. How data moves within a computer
  2. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs
  3. How different file formats can be read and manipulated by hardware
  4. How to scale an ETL pipeline with multiple GPUs using NVTabular
  5. How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

  1. Introduction
    • Meet the instructor
    • Create an account at courses(dot)nvidia(dot)com/join
  2. Data On The Hardware Level
    • Pandas and CuDF
    • Dask
  3. ETL With NV Tabular
    • Transform raw json into analysis-ready parquet files
    • Learn how to quickly add features to a dataset, such as Categorify and Lambda operators
  4. Data Visualization
    • Learn how to use descriptive statistics and plots like histograms in order to assess data quality
    • Learn effective memory usage, so users can quickly filter data through a graphical interface
  5. Final Project
    • Review a dashboard
    • Apply the techniques learned in class to find and eliminate efficiencies in the backend code
  6. Final Review
    • Review key learnings and answer questions
    • Complete the assessment, earn your certificate and complete the workshop survey
    • Learn how to set up your own AI application development environment

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

About Instructor

DataSociety

148 Courses

Not Enrolled
This course is currently closed