Manferdelli paper, 2007,

Amdahl paper, 1967.

Knuth paper on premature optimization, 1974.

Dongarra, Sterling, Simon & Strohmaier paper, 2005.

Nickolls & Dally paper, 2010.

Glaskowsky, whitepaper on Fermi architecture.

Volkov & Demmel paper, 2008.

Intel’s paper on debunking GPU performance, 2010.

Vuduc et al. paper on GPU-CPU performance comparison, 2010.

Bell & Garland, 2009. Conference paper, or longer technical report.

Thrust - GPU Gems - PDF.

Better Performance at Lower Occupancy - Vasily Volkov.

Microsoft’s .NET Task Parallel Library.

Syllabus & Student Feedback


Mid-semester student feedback


Available via Mercurial:

hg clone

Also available via Git (but may not be up to date):

git clone

Note: Do not use the files below in your Git/Mercurial repositories. They will be added to your repository once you do a pull/update.

Midterm Project: Default Option (banded linear solver) - Due: 04/12. pdf.

Assignment 12 - Due: 04/29 pdf.

Assignment 11 - Due: 04/22 pdf.

Assignment 10 - Due: 04/12 pdf.

Assignment 09 - Due: 03/22 pdf.

Assignment 08 - Due: 03/22 zip.

Assignment 07 - Due: 03/15 zip.

Assignment 06 - Due: 03/08 zip.

Assignment 05 - Due: 03/01 zip.

Assignment 04 - Due: 02/23 pdf. Source files: 1 2 3 4

Assignment 03 - Due: 02/16 pdf.

Assignment 02 - Due: 02/09 pdf. Prefix Scan 1990 paper of Blelloch

Assignment 01 - Due: 02/02 pdf.


Video recording of lectures

ME964 Forum

CUDA Online Documentation

CUDA Programming Guide

CUDA C Best Practices Guide

CUDA Forum on NVIDIA website

Fermi Architecture Overview, 2010.

cuda-gdb Debugger. User Manual.

Tutorial, C Programming Language

C Programming Language

Tutorial, C++ Programming Language

GPU Gems 3

GPU Gems 2

GPU Gems

OpenMP 3.0 Application Programming Interface

Lectures [PPTX , PDF , VIDEO]

05-08-2012 - Pratical CUDA Programming: git, CMake, trac, MATLAB/C++ interfacing. Lecture video.

05-03-2012 - Parallel Programming patterns. Lecture video.

05-01-2012 - Data scoping Example. OpenMP API. CUDA, OpenMP, MPI: departing thoughts. Lecture video.

04-26-2012 - Sections and Tasks in OpenMP. Data scoping. OpenMP Synchronization. Lecture video.

04-24-2012 - Wrap-up, Derived Datatypes in MPI. Parallel computing with OpenMP, intro. Lecture video.

04-19-2012 - Wrap-up, Collective Communication support in MPI. Lecture video.

04-17-2012 - Midterm Exam. No class.

04-12-2012 - Non-blocking Send/Recieve Operations. Collective Communication support in MPI. Lecture video.

04-10-2012 - Blocking Send/Recieve Operations. Building and Debugging MPI code on Euler. [Building MPI on Euler.] (Lectures/mpiGettingStarted.mp4) Debuging MPI code on Euler Lecture video.

03-29-2012 - Parallel Computing using the Message Passing Interface approach. Introduction. Lecture video.

03-27-2012 - Wrap-up, GPU computing with Thrust. The CUDA ecosystem. GPU Computing wrap-up. Lecture video.

03-22-2012 - CUDA Streams, wrap-up. GPU computing with Thrust. Lecture video.

03-20-2012 - Parallel Prefix Scan in CUDA. CUDA Streams. Overlapping data movement and execution in CUDA. Lecture video.

03-15-2012 - CUDA Execution Configuration and Instruction Optimization Heuristics. CUDA Optimization Wrapup. Lecture video.

03-13-2012 - Tiling in CUDA. Array Reduction. Lecture video.

03-08-2012 - CUDA Shared Memory. Synchronization. Atomic operations. Lecture video.

03-06-2012 - CUDA Scheduling Issues. Global Memory Access in CUDA. Lecture video.

03-01-2012 - CUDA Profiling. Debugging and Profiling Example. [Example Code & Script.] (Documents/ Lecture video.

02-28-2012 - CUDA Debugging: cuda-gdb and cuda-memcheck. Lecture video.

02-23-2012 - CUDA Execution Scheduling Issues. Lecture video.

02-21-2012 - CUDA Memory Ecosystem. Lecture video.

02-16-2012 - CUDA execution configuration and CUDA API. Lecture video.

02-14-2012 - Intro, CMake. CUDA execution configuration. Andrew’s screencast on CMake. Lecture video.

02-09-2012 - Intro, GPU Computing. Lecture video.

02-07-2012 - Parallel Computing Overview. Lecture video.

02-02-2012 - The Eclipse IDE; Parallel Computing: why and why now? With Video:pptx Lecture video.

01-31-2012 - Quick Overview of C Programming. Debugging with gdb. Version control with Mercurial. Logging into Euler. Lecture video.

01-26-2012 - Quick Overview of C Programming. Lecture video.

01-24-2012 - ME964 Syllabus. Course overview. Lecture video.

© Simulation Based Engineering Laboratory, Dan Negrut 2016.

SBEL is led by Mead Witter Foundation Professor Dan Negrut in the Department of Mechanical Engineering at UW-Madison.

[News] [Publications] [Projects] [People] [Animations] [Resources] [Outreach] [Courses] [Forum] [About the Lab]
UW Logo