Documents
Knuth paper on premature optimization, 1974.
Dongarra, Sterling, Simon & Strohmaier paper, 2005.
Glaskowsky, whitepaper on Fermi architecture.
Intel’s paper on debunking GPU performance, 2010.
Vuduc et al. paper on GPU-CPU performance comparison, 2010.
Bell & Garland, 2009. Conference paper, or longer technical report.
Better Performance at Lower Occupancy - Vasily Volkov.
Microsoft’s .NET Task Parallel Library.
Syllabus & Student Feedback
Assignments
Available via Mercurial:
hg clone http://sbel.wisc.edu/Courses/ME964/2012/AssignmentsAlso available via Git:
git clone http://sbel.wisc.edu/Courses/ME964/2012/Assignments/.gitNote: Do not use the files below in your Git/Mercurial repositories. They will be added to your repository once you do a pull/update.
Midterm Project: Default Option (banded linear solver) - Due: 04/12. pdf.
Assignment 12 - Due: 04/29 pdf.
Assignment 11 - Due: 04/22 pdf.
Assignment 10 - Due: 04/12 pdf.
Assignment 09 - Due: 03/22 pdf.
Assignment 08 - Due: 03/22 zip.
Assignment 07 - Due: 03/15 zip.
Assignment 06 - Due: 03/08 zip.
Assignment 05 - Due: 03/01 zip.
Assignment 04 - Due: 02/23 pdf. Source files: 1 2 3 4
Assignment 03 - Due: 02/16 pdf.
Assignment 02 - Due: 02/09 pdf. Prefix Scan 1990 paper of Blelloch
Assignment 01 - Due: 02/02 pdf.
Resources
Fermi Architecture Overview, 2010.
cuda-gdb Debugger. User Manual.
Tutorial, C Programming Language
Tutorial, C++ Programming Language
OpenMP 3.0 Application Programming Interface
Lectures [PPTX , PDF , VIDEO]
05-08-2012 - Pratical CUDA Programming: git, CMake, trac, MATLAB/C++ interfacing. Lecture video.
05-03-2012 - Parallel Programming patterns. Lecture video.
05-01-2012 - Data scoping Example. OpenMP API. CUDA, OpenMP, MPI: departing thoughts. Lecture video.
04-26-2012 - Sections and Tasks in OpenMP. Data scoping. OpenMP Synchronization. Lecture video.
04-24-2012 - Wrap-up, Derived Datatypes in MPI. Parallel computing with OpenMP, intro. Lecture video.
04-19-2012 - Wrap-up, Collective Communication support in MPI. Lecture video.
04-17-2012 - Midterm Exam. No class.
04-12-2012 - Non-blocking Send/Recieve Operations. Collective Communication support in MPI. Lecture video.
04-10-2012 - Blocking Send/Recieve Operations. Building and Debugging MPI code on Euler. Building MPI on Euler. Debuging MPI code on Euler Lecture video.
03-29-2012 - Parallel Computing using the Message Passing Interface approach. Introduction. Lecture video.
03-27-2012 - Wrap-up, GPU computing with Thrust. The CUDA ecosystem. GPU Computing wrap-up. Lecture video.
03-22-2012 - CUDA Streams, wrap-up. GPU computing with Thrust. Lecture video.
03-20-2012 - Parallel Prefix Scan in CUDA. CUDA Streams. Overlapping data movement and execution in CUDA. Lecture video.
03-15-2012 - CUDA Execution Configuration and Instruction Optimization Heuristics. CUDA Optimization Wrapup. Lecture video.
03-13-2012 - Tiling in CUDA. Array Reduction. Lecture video.
03-08-2012 - CUDA Shared Memory. Synchronization. Atomic operations. Lecture video.
03-06-2012 - CUDA Scheduling Issues. Global Memory Access in CUDA. Lecture video.
03-01-2012 - CUDA Profiling. Debugging and Profiling Example. Example Code & Script. Lecture video.
02-28-2012 - CUDA Debugging: cuda-gdb and cuda-memcheck. Lecture video.
02-23-2012 - CUDA Execution Scheduling Issues. Lecture video.
02-21-2012 - CUDA Memory Ecosystem. Lecture video.
02-16-2012 - CUDA execution configuration and CUDA API. Lecture video.
02-14-2012 - Intro, CMake. CUDA execution configuration. Andrew’s screencast on CMake. Lecture video.
02-09-2012 - Intro, GPU Computing. Lecture video.
02-07-2012 - Parallel Computing Overview. Lecture video.
02-02-2012 - The Eclipse IDE; Parallel Computing: why and why now? With Video:pptx Lecture video.
01-31-2012 - Quick Overview of C Programming. Debugging with gdb. Version control with Mercurial. Logging into Euler. Lecture video.
01-26-2012 - Quick Overview of C Programming. Lecture video.
01-24-2012 - ME964 Syllabus. Course overview. Lecture video.

