Scaling CUDA C++ Applications to Multiple Nodes

Learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.

8 hours of instruction

Learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.

OBJECTIVES

  1. Learn several methods for writing multi-GPU CUDA C++ applications
  2. Use a variety of multi-GPU communication patterns and understand their tradeoffs
  3. Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
  4. Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
  5. Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges
  6. Explore scaling considerations for a variety of GPU-cluster configurations

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

  1. Introduction
    • Meet the instructor
    • Create an account
  2. Multi GPU Programming Paradigms
    • Use CUDA to utilize multiple GPUs
    • Learn how to enable and use direct peer-to-peer memory communication
    • Write an SPMD version with CUDA-aware MPI
  3. Introduction To NVSHMEM
    • Use NVSHMEM to write SPMD code for multiple GPUs
    • Utilize symmetric memory to let all GPUs access data on other GPUs
    • Make GPU-initiated memory transfers
  4. Halo Exchanges With NVSHMEM
    • Write an NVSHMEM implementation of a Laplace equation Jacobi solver
    • Refactor a single GPU 1D wave equation solver with NVSHMEM
    • Complete the assessment and earn a certificate

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

About Instructor

DataSociety

148 Courses

Not Enrolled
This course is currently closed