6.0 OpenACC : A new standard and compiler¶
The software libraries and compilers that we have used so far, OpenMP and MPI, are at their core standards. These standards are written to dictate what functionality the compiler and the associated libraries should provide. We have used gcc and g++ for compiling C and C++ code. These compilers conform to the OpenMP standard. The OpenMP standard has several versions, and each successive version of gcc and g++ adds most of the features of the new version of the standard. There are other compilers for C and C++ that also follow the OpenMP standard. MPI libraries and compilers and associated programs like mpirun are also defined in a standard. There are two primary versions of MPI that we can install: MPICH and OpenMPI. You have actually used one of each, as we have OpenMPI on the Raspberry Pis and MPICH on the mscs1 server.
These two standards, OpenMP for shared memory CPU computing, and MPI for distributed networked systems or single machines, each work with traditional multicore CPU computers. As we have seen, they can give us reasonable scalability to perform better and work on larger problems.
Yet many computational problems need much larger scalability. The field of supercomputing exists to develop hardware and software to handle these scalability needs. One aspect of providing more scalability is to turn to special devices that can be installed along with a multicore CPU. Graphics processing units, or GPUs, are such a type of accelerator. Today GPUS come in all sorts of sizes, from small ones for mobile devices and laptops to large ones like this that are separate cards containing thousands of small cores that are slower than a typical CPU chip.
Because GPUs are special hardware with thousands of cores, using them for parallelism is often called manycore computing. They were first designed to off load graphics computations to speed up response time for applications such as games or visualizations. Now they are used for speeding up general computations as well.
To do this, we will be using a specific new compiler called pgcc, based on a different standard called OpenACC. The ACC part of OpenACC stands for accelerator. Separate cards such as GPUs are often referred to as accelerators when they are used for accelerating certain types of computations. The pgcc compiler is one a several compilers written originally by the Portland Group (PGI), which is now owned by NVIDIA, the major GPU manufacturer and inventor of the CUDA language compilers for running code on GPUS.
A new compiler¶
The pgcc compiler will process code with pragma lines similar to those of OpenMP into different versions of code:
regular sequential version (when no pragmas present)
shared memory version using OpenMP (backwards compatible with OpenMP)
multicore version for the CPU (new pragmas defined in OpenACC standard)
manycore version for the GPU (additional pragmas for accelerator computation)
As you will see, the OpenACC standard and the PGI compilers are designed to enable software developers the ability to write one program, or begin with one sequential program or OpenMP program, and easily transform it to a program that can run a larger sized problem faster by using thousands of threads on a GPU device.
Vector Addition as a basic example¶
The example we will start with is the simplest linear algebra example: adding two vectors together. We will examine several code versions of this that will demonstrate how the new pgcc openACC compiler can create the different versions of code mentioned above:
Book Section |
Code folder |
Description |
---|---|---|
5.1 |
1-sequential |
2 sequential versions, compiled with gcc and pgcc |
5.2 |
2-openMP |
2 openMP pragma versions, compiled with gcc and pgcc |
5.3 |
3-openacc-multicore |
1 openACC pragma version using just the multicore CPU, compiled with pgcc |
5.4 |
4-openacc-gpu |
1 openACC pragma version for the GPU device, compiled with pgcc |
The code examples in these sections of the book compile and run on a remote machine and display results here in your browser.
If you want to try the code on your own machine, the code for these examples is in a GitHub repo for the CSInParallel Project. The code is inside a folder called IntermediateOpenACC/OpenACC-basics. When you see filename paths for code shown in this chapter such as in the Code folder column above, they are in relation to the IntermediateOpenACC/OpenACC-basics folder path.
All the code examples have a Makefile in each subfolder.