Students will be able to analyze the computing and memory architecture of a super computing node and use OpenMP directives to improve vectorization of their programs. This module focuses on the key ...
A hands-on introduction to parallel programming and optimizations for 1000+ core GPU processors, their architecture, the CUDA programming model, and performance analysis. Students implement various ...