8 hours of instruction
Discover how to write CUDA C++ applications that efficiently and correctly utilize all available GPUs in a single node, dramatically improving the performance of applications and making the most cost-effective use of systems with multiple GPUs.
OBJECTIVES
- Use concurrent CUDA Streams to overlap memory transfers with GPU computation
- Utilize all available GPUs on a single node to scale workloads across all available GPUs
- Combine the use of copy/compute overlap with multiple GPUs
- Rely on the NVIDIA Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop
PREREQUISITES
None
SYLLABUS & TOPICS COVERED
- Introduction And Using Jupyter Lab
- Meet the instructor and get familiar with your GPU-accelerated interactive JupyterLab environment
- Application Overview
- Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course
- Observe the current performance of the single GPU CUDA C++ application
- using the Nsight Systems
- Introduction To CUDA Streams
- Learn the rules that govern concurrent CUDA Stream behavior
- Use multiple CUDA streams to perform concurrent host-to-device and deviceto-host memory transfers
- Utilize multiple CUDA streams for launching GPU kernels
- Observe multiple streams in the Nsight Systems Visual Profiler timeline view
- Copy Or Compute Overlap With CUDA Streams
- Learn the key concepts for effectively performing copy/compute overlap
- Explore robust indexing strategies for the flexible use of copy/compute overlap in applications
- Refactor the single-GPU CUDA C++ application to perform copy/compute overlap
- See copy/compute overlap in the Nsight Systems visual profiler timeline
- Multiple GPUs With CUDAC Plus Plus
- Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++
- Explore robust indexing strategies for the flexible use of multiple GPUs in applications
- Refactor the single-GPU CUDA C++ application to utilize multiple GPUs
- See multiple GPU utilization in the Nsight Systems Visual Profiler timeline
- Copy Or Compute Overlap With Multiple GPUs
- Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs
- Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs
- Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs
- Observe performance benefits for copy/compute overlap on multiple GPUs See copy/compute over
SOFTWARE REQUIREMENTS
Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.