Programming Massively Parallel Processors with CUDA
by Stanford University
To listen to an audio podcast, mouse over the title and click Play. Open iTunes to download and subscribe to iTunes U collections.
Description
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals. The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors. Visit the CS193G companion website for course materials.
| Name | Description | Released | Price | ||
|---|---|---|---|---|---|
| 1 | Video1. Introduction to Massively Parallel Computing (March 30, 2010) | science, technology, computer science, CS, software engineering, programming, parallel processors, CUDA, language, code, Moore's Law, chips, transistors, compiler, architecture, core, GPU, CPU, memory, threads, kernels, research | 9 4 10 | Free | View In iTunes |
| 2 | Video2. Introduction to CUDA (April 1, 2010) | science, technology, computer science, CS, software engineering, programming, parallel processors, CUDA, language, code, Computers, coding, MP0, MP1, hardware, software, memory management, GPU, CPU, memory, parallel code, kernel, threads, launch, thread b | 14 4 10 | Free | View In iTunes |
| 3 | Video3. CUDA Threads & Atomics (April 6, 2010) | Science, technology, computer science, CS, software engineering, programming, coding, parallel processors, CUDA, language, code, threads, kernel launch, global communication, grid, memory, hardware, sums, warps, control flow divergence, data | 15 4 10 | Free | View In iTunes |
| 4 | Video4. CUDA Memories (April 8, 2010) | science, technology, software engineering, computer science, CS, programming, parallel processors, CUDA, global memory, language, code, pointer, dereference, hardware, thread, variables, kernel, atomics, optimization, matrix multiplication, dot product | 21 4 10 | Free | View In iTunes |
| 5 | Video5. Performance Considerations (April 13, 2010) | computer science, technology, programming, math, engineering, code, global shared memory, byte, bandwidth, if switch statement, loop, thread, loop, chip, hardware, structure of arrays, SMEM, warp, thread-block, value, execute, data, Amdahl's Law, optimiza | 21 4 10 | Free | View In iTunes |
| 6 | Video6. Parallel Patterns I (April 15, 2010) | engineering, computer science, programming language, graphics processing unit, parallel patterns, reduce, block, scan, compact, input array, segmented scan, sort, kernel function, thread, CUDA, NVIDIA | 27 5 10 | Free | View In iTunes |
| 7 | Video7. Parallel Patterns II (April 22, 2010) | engineering, computer science, programming language, graphics processing unit, input array, segmented scan, sort, mapreduce, parallel, kernel function, map, radixsort, CUDA, | 17 5 10 | Free | View In iTunes |
| 8 | Video8. Introduction to Thrust (April 27, 2010) | engineering, computer science, programming language, graphics processing unit, Thrust, algorithms, iterator, namespace, container, structure of array, CUDA, C++, | 17 5 10 | Free | View In iTunes |
| 9 | Video9. Sparse Matrix Vector Operations (April 29, 2010) | engineering, computer science, programming language, graphics processing unit, linear system, sparse matrix vector multiplication, throughput, divergence, CUDA, GPU | 17 5 10 | Free | View In iTunes |
| 10 | Video10. Solving Partial Differential Equations with CUDA (May 4, 2010) | engineering, mathematics, computer science, programming, calculus, partial differential equation, poisson, linear, matrix, parallel processor, algorithm, Gauss-Seidel Relaxation, Laplacian, cyclic reduction, PDE, CUDA, NVIDIA | 17 5 10 | Free | View In iTunes |
| 11 | Video11. The Fermi Architecture (May 6, 2010) | engineering, computer science, programming, chip, graphics processing unit, Tesselation, Fermi architecture, language, limiter theory, bandwidth, space, thread, barrier, partition, GPU, CUDA, NVIDIA | 17 5 10 | Free | View In iTunes |
| 12 | Video12. NVIDIA OptiX: Ray Tracing on the GPU (May 11, 2010) | engineering, computer science, programming interface, graphics, optics, light, ray tracing, rasterization, processing unit, recursion, warp, SIMD, GPU, CUDA, NVIDIA, OptiX | 26 5 10 | Free | View In iTunes |
| 13 | Video13. Future of Throughput (May 13, 2010) | Technology, computers, performance, chips, parallelism, efficiency, locality, single-thread processors, steam processors, parallel arithmetic units, exposed storage hierarchy, steam programming, bulk operations, GPU computing, Intel, arithmetic units, ope | 26 5 10 | Free | View In iTunes |
| 14 | Video14. Path Planning System on the GPU (May 18, 2010) | engineering, computer science, programming language, graphics processing unit, ray tracing, path planning, multi agent, kernel, Fermi, GPU, NVIDIA, CUDA | 9 6 10 | Free | View In iTunes |
| 15 | Video15. Optimizing Parallel GPU Performance (May 20, 2010) | computer science, technology, programming, math, engineering, code, optimization, performance, memory, cache, computing, software, architecture, technology, thread block, RAM, parallelism, scalar processor, programming, data, SIMT warp execution, global, | 9 6 10 | Free | View In iTunes |
| 16 | Video16. Parallel Sorting (April 20, 2010) | computer science, technology, programming, math, engineering, code, optimization, parallelism, sorting, key sequence, memory, processor, CPU, GPU, data, search, thread, element, pair, Radix, code, performance, scalable program, merge, bit, sorting network | 9 6 10 | Free | View In iTunes |
| Total: 16 Episodes |











