7.1 Sequential versions with 2 compilers¶

We will begin by illustrating that the same code file for vector addition can be compiled and run by different C compilers: gcc and pgcc. All of the examples in this chapter use the same functions for gathering command line arguments and for utility functions that initialize the vectors, print them, and check for correct results. These are in separate files in the code on the repository and are shown in code blocks below whose run button is disabled.

Note

The OpenACC compiler, pgcc, uses the .c suffix, because code files are treated as C code files that can be compiled with a C compiler. Though the pgcc compiler is creating CUDA code behind the scenes for versions that will run on the GPU, we now think of OpenACC code files as C code files with pragmas, much like OpenMP.

Command line argument handling¶

The following code block contains the functions for gathering the command line arguments for our vector addition program. It uses a C library function call getopt(), which enables us to use syntax like this when executing this from the command line on your own machine:

./vectorAdd -n 1024

The getopt() function is used on line 19 below. If you have not used this before, you should be able to find tutorials on the web for it.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <ctype.h>

void getArguments(int argc, char *argv[], int * n, int *numThreads);
int isNumber(char s[]);
void exitWithError(char cmdFlag, char ** argv);
void Usage(char *program);

void getArguments(int argc, char *argv[], int * n, int *numThreads)
{

char *nvalue;      // number of elements
    char *numThreads_value;

int c;        // result from getopt calls

while ((c = getopt (argc, argv, "n:t:")) != -1) {

switch (c)
        {

case 'n':
            if (isNumber(optarg)) {
                nvalue = optarg;
                *n = atoi(nvalue);
            } else {
                exitWithError(c, argv);
            }
            break;

case 't':
            if (isNumber(optarg)) {
                numThreads_value = optarg;
                *numThreads = atoi(numThreads_value);
            } else {
                exitWithError(c, argv);
            }
            break;

case ':':
            printf("Missing arg for %c\n", optopt);
            Usage(argv[0]);
            exit(EXIT_FAILURE);
            break;

case '?':
            if (isprint (optopt)) {
                fprintf (stderr, "Unknown option `-%c'.\n", optopt);
                Usage(argv[0]);
                exit(EXIT_FAILURE);
            } else {
                fprintf (stderr,
                        "Unknown non-printable option character `\\x%x'.\n",
                        optopt);
                Usage(argv[0]);
                exit(EXIT_FAILURE);
            }
            break;

}
    }
}

int isNumber(char s[])
{
    for (int i = 0; s[i]!= '\0'; i++)
    {
        if (isdigit(s[i]) == 0)
            return 0;
    }

return 1;
}

void exitWithError(char cmdFlag, char ** argv) {
    fprintf(stderr, "Option -%c needs a number value\n", cmdFlag);
    Usage(argv[0]);
    exit(EXIT_FAILURE);
}

void Usage(char *program) {
    fprintf(stderr, "Usage: %s [-n numElements] [-t numThreads]\n", program);
}

Helper functions used by each example¶

The following functions are used for each different main program function that you find here and in the sections following this one.

Sequential main program: gcc compiler¶

The main program below is compiled by including the above two code blocks. As you can see below the code, there is a place for you to change the command line arguments, as we have seen in other examples in this book and the PDC for Beginners book. There is also now a box where we expose the compiler arguments that in this case are sent to the gcc compiler. Run it first with all of these values given to you.

#include <math.h>   // fmaxf function
#include <stdio.h>  // printf
#include <stdlib.h> // malloc

// CPU version of add sequentially
void CPUadd(int n, float *x, float *y)
{
    for (int i = 0; i < n; i++) {
        y[i] = x[i] + y[i];
    }
}

int main(int argc, char **argv)
{
    printf("Vector addition using several compilers.\n");
    // Set up size of arrays for vectors
    // int N = 1<<20;
    // same value, shown as multiple of 1024
    int N = 1024*1024;

int numThreads =1;

// get command line args to change size of array
    getArguments(argc, argv, &N, &numThreads);

// ignore numThreads for sequential case
    if (numThreads != 1) {
        numThreads = 1;     // not used below
        printf("Warning: this is a sequential version and the number of threads is always 1, even though you used -t\n");
    }

printf("size (N) of 1D arrays are: %d\n\n", N);

// host vectors
    float *x, *y;

// Size, in bytes, of each vector
    size_t bytes = N*sizeof(float);

// Allocate memory for each vector on host
    x = (float*)malloc(bytes);
    y = (float*)malloc(bytes);

// initialize x and y arrays on the host
    initialize(x, y, N);  // set values in each vector

if (N < 40) {   // debug
        printf("x:\n");
        showVec(x, N);
        printf("y:\n");
        showVec(y, N);
    }

printf("add vectors on host\n");

CPUadd(N, x, y);

if (N < 40) {   // debug
        printf("y result:\n");
        showVec(y, N);
    }

checkForErrors(y, N);

printf("execution complete\n");

// Release host memory
    free(x);
    free(y);

return 0;
}

Notes:

Note that for this code, if the number of values in each array is less than 40, the arrays will be printed so that you can verify visually what is in them and that it is added them correctly. You can experiment with the ‘10’ n the command line arguments to illustrate this. The arrays are initialized so that each value in array x is 1.0 and each value in array y is 2.0. This makes it straightforward to sheck whether the results are correct.
Since this is a sequential version, the option for number of threads is ignored.
The default compiler flag provided is -O2 (capital O, not zero) for a somewhat fast code optimization level that is close to the one used by the pgcc compiler below. To generate a compiler error you could try adding ,’-foo’ after ‘-O2’.
Not shown but used is a linker argument, -lm, for math library functions used to check that the result is correct.

Exercise:

Try running with a significantly larger number of elements in each array. Note that in this simple case the main function always checks whether the result is correct. You can also remove the ‘-n’,’10’ completely from the command line arguments and use the default array size set in the code.

Sequential main program: pgcc compiler¶

Now we will use the new pgcc compiler for this code. Note below the change in the compiler arguments for this compiler. In particular, notice that the compiler directive -acc=host is used to indicate that sequential code for the host’s CPU should be generated for the code blocks where OpenACC pragmas appear. In addition, the flag for generating optimized code is -fast (without the O needed for gcc), and the -Minfo=opt is used to profide some information about the code optimization.

The main program is compiled by including the two code blocks for command line arguments and helper functions. You can run this one to see how the new pgcc compiler runs the program and creates compiler output that is explained below.

#include <math.h>   // fmaxf function
#include <stdio.h>  // printf
#include <stdlib.h> // malloc

// CPU version of add sequentially
void CPUadd(int n, float *x, float *y)
{
    for (int i = 0; i < n; i++) {
        y[i] = x[i] + y[i];
    }
}

int numThreads =1;

// get command line args to change size of array
    getArguments(argc, argv, &N, &numThreads);

printf("size (N) of 1D arrays are: %d\n\n", N);

// host vectors
    float *x, *y;

// Size, in bytes, of each vector
    size_t bytes = N*sizeof(float);

// Allocate memory for each vector on host
    x = (float*)malloc(bytes);
    y = (float*)malloc(bytes);

// initialize x and y arrays on the host
    initialize(x, y, N);  // set values in each vector

if (N < 40) {   // debug
        printf("x:\n");
        showVec(x, N);
        printf("y:\n");
        showVec(y, N);
    }

printf("add vectors on host\n");

CPUadd(N, x, y);

if (N < 40) {   // debug
        printf("y result:\n");
        showVec(y, N);
    }

checkForErrors(y, N);

printf("execution complete\n");

// Release host memory
    free(x);
    free(y);

return 0;
}

Here is how to interpret the output from running this version:

The output from the program comes first. Note how it is the same as the default gcc version above.
After the program output, there is a line that looks like this: ===== STANDARD ERROR =====. What follows this is the output from the pgcc compiler. We indicated that we wanted this output by using the compiler flag ‘-Minfo=opt’. This pgcc compiler provides this option so that you can see the optimizations the compiler used as a result of including the -fast option. Look at the Wikipedia page for loop unrolling to see a discussion of this technique, which was the optimization used here.

As with the previous gcc version, try larger array sizes.

Same code, two compilers¶

This example uses the same code, but illustrates slight differences in the flags used by each compiler. Each compiler produces different machine code.

We will next look at how the pgcc compiler can generate machine code for a file with OpenMP pragmas, then follow on with a different multicore version and ulimately a GPU version.

Note

The pgcc/nvc compiler documentation indicates that the compiler flag -fast is roughly equivalent to -O2. Each compiler is different, however, and generates different executable code. Using higher optimization, such as -O3, sometimes makes the code faster, but sometimes not. You always have to run experiments to find out.

You have attempted of activities on this page