Accelerating CUDA C++ Applications with Multiple GPUs

Discover how to write CUDA C++ applications that efficiently and correctly utilize all available GPUs in a single node, dramatically improving the performance of applications and making the most cost-effective use of systems with multiple GPUs.

View Course details

DataSociety

8 hours of instruction

OBJECTIVES

Use concurrent CUDA Streams to overlap memory transfers with GPU computation
Utilize all available GPUs on a single node to scale workloads across all available GPUs
Combine the use of copy/compute overlap with multiple GPUs
Rely on the NVIDIA Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

Introduction And Using Jupyter Lab
- Meet the instructor and get familiar with your GPU-accelerated interactive JupyterLab environment
Application Overview
- Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course
- Observe the current performance of the single GPU CUDA C++ application
- using the Nsight Systems
- Introduction To CUDA Streams
Learn the rules that govern concurrent CUDA Stream behavior
- Use multiple CUDA streams to perform concurrent host-to-device and deviceto-host memory transfers
- Utilize multiple CUDA streams for launching GPU kernels
- Observe multiple streams in the Nsight Systems Visual Profiler timeline view
- Copy Or Compute Overlap With CUDA Streams
Learn the key concepts for effectively performing copy/compute overlap
- Explore robust indexing strategies for the flexible use of copy/compute overlap in applications
- Refactor the single-GPU CUDA C++ application to perform copy/compute overlap
- See copy/compute overlap in the Nsight Systems visual profiler timeline
- Multiple GPUs With CUDAC Plus Plus
- Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++
Explore robust indexing strategies for the flexible use of multiple GPUs in applications
- Refactor the single-GPU CUDA C++ application to utilize multiple GPUs
- See multiple GPU utilization in the Nsight Systems Visual Profiler timeline
- Copy Or Compute Overlap With Multiple GPUs
- Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs
- Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs
- Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs
- Observe performance benefits for copy/compute overlap on multiple GPUs See copy/compute over

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

About Instructor

DataSociety

148 Courses

Accelerating CUDA C++ Applications with Multiple GPUs

About Instructor

DataSociety

Committed to your success with open source. OpenTeams is your easy point of access to a range of services from our open source expert network, from commercial open source support to open source training, staffing & recruiting services, and more.

Resources

OpenTeams