Intro to GPU

中文版：GPU 简介

GPU Programming Introduction

GPU Programming GPU computing GPU: Graphics Processing Unit Traditionally used for real-time rendering of computer graphics High Computational density and memory bandwidth Throughput processor: 1000s of concurrent threads to hide latency

GPU Programming • GPU – graphics processing unit • Originally designed as a graphics processor • NVIDIA GeForce 256 (1999) – first GPU o single-chip processor for mathematically-intensive tasks o transforms of vertices and polygons o lighting o polygon clipping o texture mapping o polygon rendering • NVIDIA Geforce 3, ATI Radeon 9700 – early 2000’s o Now programmable!

GPU Programming Modern GPUs are present in ✓Embedded systems ✓Personal Computers ✓Game consoles ✓Mobile Phones ✓Workstations

GPU Programming Traditional GPU workflow https://learnopengl.com/#!Getting-started/Hello-Triangle

GPU Programming GPGPU 1999-2000 computer scientists from various fields started using GPUs to accelerate a range of scientific applications. GPU programming required the use of graphics APIs such as OpenGL and Cg. 2001 – LU factorization implemented using GPUs 2002 James Fung (University of Toronto) developed OpenVIDIA. NVIDIA greatly invested in GPGPU movement and offered a number of options and libraries for a seamless experience for C, C++ and Fortran programmers.

GPU Programming GPGPU timeline In November 2006 NVIDIA launched CUDA, an API that allows to code algorithms for execution on GeForce GPUs using the C programming language. Khronus Group defined OpenCL in 2008 supported on AMD, NVIDIA and ARM platforms. In 2012 NVIDIA presented and demonstrated OpenACC - a set of directives that greatly simplify parallel programming of heterogeneous systems.

GPU Programming CPUs consist of a few cores optimized for serial processing and general purpose calculations. GPUs consist of hundreds or thousands of smaller, efficient cores designed for parallel performance. The hardware is designed for specific calculations. CPU GPU

GPU Programming Intel Xeon E5-2680v4: Clock speed: 2.4 GHz 4 instructions per cycle with AVX2 CPU - 28 cores 2.4 x 4 x 28 = 268.8 Gigaflops double precision NVIDIA Tesla P100: Single instruction per cycle 3584 CUDA cores 4.7 Teraflops double precision CPU GPU

GPU Programming Intel Xeon E5-2680v4 : Memory size: 256 GB Bandwidth: 76.8 GB/sec NVIDIA Tesla P100: Memory size: 12GB total Bandwidth: 549 GB/sec CPU GPU

GPU Programming 10x GPU Computing Growth 2008 6,000 Tesla GPUs 150K CUDA downloads 77 Supercomputing Teraflops 60 University Courses 4,000 Academic Papers 2015 450,000 Tesla GPUs 3M CUDA downloads 54,000 Supercomputing Teraflops 800 University Courses 60,000 Academic Papers

GPU Programming GPU Acceleration Applications GPU-accelerated libraries OpenACC Directives Programming Languages Seamless linking to GPU- enabled libraries. Simple directives for easy GPU-acceleration of new and existing applications Most powerful and flexible way to design GPU accelerated applications cuFFT, cuBLAS, Thrust, NPP, IMSL, CULA, cuRAND, etc. PGI Accelerator C/C++, Fortran, Python, Java, etc.

GPU Programming Minimum Change, Big Speed-up Application Code GPU CPU Use GPU to Parallelize Compute-Intensive Functions Rest of Sequential CPU Code +

GPU Programming Will Execution on a GPU Accelerate My Application? Computationally intensive—The time spent on computation significantly exceeds the time spent on transferring data to and from GPU memory. Massively parallel—The computations can be broken down into hundreds or thousands of independent units of work. Well suited to GPU architectures – some algorithms or implementations will not perform well on the GPU.

GPU Programming OpenACC, CUDA C Thrust, CUDA C++ C++ OpenACC, CUDA Fortran Fortran PyCUDA, PyOpenCL Python MATLAB, Mathematica Numerical analytics Theano, Tensorflow, Caffe, Torch, etc. Machine Learning

Explorer

AITC Wiki

Intro to GPU

GPU 简介

Intro to GPU

Graph View

Backlinks