Module overview
This module looks at the various types of parallel compute devices available in modern systems, and how they can be used to either improve performance, reduce energy consumption, or both. The types of devices considered include multi-core CPUs, many-core CPUs, SIMD extensions, GPUs, NPUs/TPUs, and other types of evolving compute hardware. The particular focus of this course is on how to program and optimise for such devices, including analysing workloads for bottlenecks, choosing or creating parallel algorithms, and then profiling and testing. In most cases there will also be multiple types of parallelism applied at once, using combinations of threaded, SIMD, pipelines, and task-level parallelism. After completing the module students should be able to tackle the acceleration of quite complex programs, and realistically achieving speedups of 10x .. 100x over baseline implementations.