3.4.1 2-dimensional grids with nested loops¶

A common computational pattern in PDC applications is the use of a structured grid to represent, or model a real-life situation in nature. This computational patter is listed in the second column from the left in the patterns diagram in Chapter 1 of this book. At the heart of these computations are 2-dimensional arrays, or grids.

In this example, we will create the 2-dimensional grid of width w and length l as a flattened array in this manner:

// a flattened 2D grid of doubles of size width x length
int arraySize = (width) * (length);
size_t bytes = arraySize * sizeof(double);
double *grid = (double *)malloc(bytes);

In this scheme, the width is the number of columns and the length is the number of rows. The command line arguments enable you to change these values (they replace the number of repetitions in the previous examples):

-t indicates number of threads to use.
-w indicates the width of the 2D grid (default is 8).
-l indicates the length of the 2D grid (default is 8).
-c indicates that a fixed seed will be used, resulting in the same
   stream of numbers each time this is run.
-d indicates whether the trng generator will dole out numbers in
   block or in leapfrog fashion. (default is block).

Another change you might note here is that we have changed the default method of splitting up the stream of random numbers to blocks rather than leapfrog. We’ve done this because this method naturally enables better use of cache memory because the grid can be naturally accessed in row-major order in nested loops: row by row, column by column. Let’s examine this in more detail. A 5 x 5 grid, would have indices in the flattened 1D array like this:

A natural way to traverse this flattened 2D grid in a sequential fashion, placing random numbers in it, is as follows:

int i, j;
double randN;

for (i = 0; i < l; i++) {    // row by row
    for (j = 0; j < w; j++) {    // column by column
        randN = uni(RNengine);

        int id = i * w + j;    // flattened 2D index

        grid[id] = randN;
    }
}

Now let’s suppose that we want to do this in parallel by having each thread work on a certain number of rows. With 2 threads we can imagine a decomposition like this, with the green cells for thread 0 and the blue cells for thread 1:

../_images/matrix_flattened_indices_2threads.png

The code for populating each row is found in the function called populateRows in the code block below (line 152 in the code you can run later). Note that this function is executed independently by each thread. Here it is pulled out so that you can compare it to the sequential version above:

// Traverse row by row, knowing the start and end row for this thread.
// PREREQUISITE:  block splitting of the random numbers between threads is being used.
//
void populateRows(double *grid, int w, int l, trng::yarn2 RNengine, trng::uniform01_dist<> uni,
                int startRow, int endRow, int tid) {
    int i, j;
    double randN;

    for (i = startRow; i < endRow; i++) {
        for (j = 0; j < w; j++) {
            randN = uni(RNengine);     // inside loop

            int id = i * w + j;    // flattened 2D index

            if (w <= 8 && l <= 8) {// for debugging, print the random number and indices
                printf("%0.3f %2d %2d %d %d |\n", randN, id, tid, i, j);
            }

            grid[id] = randN;
        }
    }
}

As you can see, the difference in the nested loops between the sequential and the parallel version is that each thread will work on a particular set of rows, generating random numbers from a block that was assigned to it. This is done with a new function called getStartStopRow (shown below) that each thread executes. The starting point for a block of random numbers to be used by a thread is created as follows inside the function createNewGrid (line 132 in the code you can run below):

// Use block splitting to partition random numbers among threads
getStartStopRow(tid, numThreads, l, &startRow, &endRow);   // enables unequal blocks per thread
long unsigned int numsToSkip = (long unsigned int)startRow * (long unsigned int)w;
RNengine.jump(numsToSkip);

Let’s visualize this. A stream or random numbers between 0 and 1.0 can be generated and placed row-by-row, column-by column into a 5x5 grid as follows:

If we use the code above with 2 threads, each thread will use the numbers in blocks of rows like this:

../_images/matrix_flattened_RNs_2threads.png

Using 3 threads will split the stream into blocks of rows like this:

../_images/matrix_flattened_RNs_3threads.png

TO DO:

Try the code below by using -t values of 1, 2, and 3. Does it match the images above?

#include <stdio.h>  // printf()
 #include <time.h>   // time()
 #include <string.h> // C++ string comparison

#include <omp.h>

// trng YARN (yet another random number) generator class
 #include <trng/yarn2.hpp>
 #include <trng/uniform01_dist.hpp> // we'll use a uniform distribution
                                 // of the random numbers

void createNewGrid(unsigned long int seed, double *grid, int w, int l, int doleOut);
 void populateRows(double *grid, int w, int l, trng::yarn2 RNengine,
                 trng::uniform01_dist<> uni, int startRow, int endRow, int tid);
 void populateColumns(double *grid, int w, int l, int numThreads,
                     trng::yarn2 RNengine, trng::uniform01_dist<> uni, int tid);
 void seqSet(int repetitions, long unsigned int seedValue);

#define LEAPFROG 0
 #define BLOCKSPLIT 1

////////////////////////////////////////////////////////////
 int main(int argc, char *argv[])
 {

// set up what conditions we will use:
     //  - default size of repetitions in the loop
     //  - whether to use a constant seed so we can repeat and
     //    get same random values in stream

int width = 8;
     int length = 8;
     int useConstantSeed = 0; // for same stream, set this to 1 on command line with -c

// method the random generator will use to dole out the numbers
     int doleOut = BLOCKSPLIT;

// for openMP
     int numThreads = 1;

// gather command line arguments
     getArguments(argc, argv, &numThreads, &width, &length,
                 &useConstantSeed, &doleOut);

//check ifvalid number of threads vs repetitions
     if (numThreads > length) {
         printf("\n*** Number of threads (%u) exceeds rows in grid (%u)\n", numThreads, length);
         printf("*** Please run with -t value less than or equal to %u\n\n", length);

return 0;
     }

// create an array to hold the random numbers on the 'heap':
     // a flattened 2D grid of doubles of size width x length
     int arraySize = (width) * (length);
     size_t bytes = arraySize * sizeof(double);
     double *grid = (double *)malloc(bytes);

omp_set_num_threads(numThreads);
     // Print out info
     printf("trng random number stream will be split ");
     if (doleOut == LEAPFROG) {
         printf("using leapfrog and populate the grid by columns.\n");
     }
     else {
         printf("into blocks and populate the grid by rows.\n");
     }
     printf("The nested loop is partitioned into possibly slightly unequal chunks per thread.\n");

// ///////////////  random generator setup /////////////////////
     // random numbers start from a seed value
     long unsigned int seedValue; // note for trng this is long unsigned

// same constant seed will generate the same sequence of rndom numbers
     // use for testing to varify same sequence regardless of number of threads
     if (useConstantSeed){
         seedValue = 888777666;
     } else {      // variable seed based on computer clock time; use for simulations
         seedValue = (long unsigned int)time(NULL);
     }

// debugging info
     if (width <= 8 && length <= 8) {
         int sampleSize = (width * length) + 16;
         printf("the stream of random numbers is:\n");
         seqSet(sampleSize, seedValue); // print a sample of random numbers for debugging
         printf("The per-thread output is printed like this:\n");
         printf("randNumber flattenedIndex threadID row col |\n");
     }

// random numbers into new grid
     createNewGrid(seedValue, grid, width, length, doleOut);

// print final grid for debugging
     if (width <= 8 && length <= 8) {
         printf("\nFinal grid of random numbers:\n");
         printGrid(grid, width, length);
     }
     // free the grid memory
     free(grid);

return 0;
 }

// Create a new grid of random numbers using trng and OpenMP.
 //
 void createNewGrid(unsigned long int seed, double *grid, int w, int l, int doleOut) {

trng::yarn2 RNengine;       // generator
     trng::uniform01_dist<> uni;  // uniform distribution for random numbers
                                 // in the range [0.0, 1.0)

#pragma omp parallel default(none) \
     shared(grid, w, l, seed, doleOut) private(RNengine, uni)
     {

RNengine.seed((long unsigned int)seed); // seed the generator

int tid = omp_get_thread_num();
         int numThreads = omp_get_num_threads();

// for blocking mode case
         int startRow =0;
         int endRow = 0;
         if (doleOut == LEAPFROG) {
             if (numThreads > 1) {
                 // Use leapfrogging to partition random numbers among threads
                 RNengine.split((unsigned)numThreads, tid);
             }
         } else {
             // Use block splitting to partition random numbers among threads
             getStartStopRow(tid, numThreads, l, &startRow, &endRow);   // enables unequal blocks per thread
             long unsigned int numsToSkip = (long unsigned int)startRow * (long unsigned int)w;
             RNengine.jump(numsToSkip);
         }

// iterate over the grid by either rows or columns depending on doleOut method
         if (doleOut == LEAPFROG) {
             populateColumns(grid, w, l, numThreads, RNengine, uni, tid);

} else {
             populateRows(grid, w, l, RNengine, uni, startRow, endRow, tid);
         }
     }  // end of parallel region
 }

// Traverse row by row, knowing the start and end row for this thread.
 // PREREQUISITE:  block splitting of the random numbers between threads is being used.
 //
 void populateRows(double *grid, int w, int l, trng::yarn2 RNengine, trng::uniform01_dist<> uni,
                 int startRow, int endRow, int tid) {
     int i, j;
     double randN;

for (i = startRow; i < endRow; i++) {
         for (j = 0; j < w; j++) {
             randN = uni(RNengine);     // inside loop

int id = i * w + j;    // flattened 2D index

if (w <= 8 && l <= 8) {// for debugging, print the random number and indices
                 printf("%0.3f %2d %2d %d %d |\n", randN, id, tid, i, j);
             }

grid[id] = randN;
         }
     }
 }

// Traverse column by column per thread to populate the grid.
 // PREREQUISITE:  using leapfrog method of splitting random number
 // stream among threads.
 //
 void populateColumns(double *grid, int w, int l, int numThreads,
                     trng::yarn2 RNengine, trng::uniform01_dist<> uni, int tid) {
     int i, j;
     double randN;

for (i = 0; i < l; i++) {
         for (j = tid; j < w; j += numThreads) {
         randN = uni(RNengine);     // inside loop

int id = i * w + j;    // flattened 2D index

if (w <= 8 && l <= 8) {// for debugging, print the random number and indices
             printf("%0.3f %2d %d %d %d |\n", randN, id, tid, i, j);
         }
         grid[id] = randN;
         }
     }
 }

// Print a sequential set of random numbers for debugging.
 void seqSet(int repetitions, long unsigned int seedValue) {

// number generation needs two things: a generator and a distribution of the numbers
     // declare the generator object
     trng::yarn2 randGen;
     // declare the distribution to use (here it is uniform with vallues in the range min to max)
     trng::uniform01_dist<> uniform;

// Set the starting point of the generator by seeding it
     randGen.seed(seedValue);

// ////////////////////// end PRNG setup //////////////////////////////////

// ///////////////////// get a portion of the stream ///////////////////////
     double nextRandValue;   // holds the next value as we go through the loop

// loop to get each number in the PRNG stream and print it
     // Note here the ubiquitous for loop construction of incrementing by 1
     int i;
     for (i=0; i < repetitions; i++) {
         // get next number in the stream from the distribution
         nextRandValue = uniform(randGen);
         // print tid(i):nextRandValue
         printf("%0.3f ",  nextRandValue);
         if ((i+1) % 20 == 0) {
         printf("\n");
         }
     }
     printf("\n");
 }

Warning

If you try sizes larger than 7 x 7, the printing of the resulting matrix gets cut off. For arrays beyond 8x8, you can see them if you comment line 96 and 99 so that the grid prints out. However, the most you can set to see the whole resulting array is width of 18 and length of 14.

The bottom line¶

What we can observe from this example is that a 2D grid, flattened into a 1D array of contiguous values in memory, can be populated using multiple threads by having the nested loop that traverses the array row-by-row split into segments of rows. The block-splitting mechanism of the trng parallel random number library can also be used to segment the stream of random numbers in a similar fashion.

It is also possible to use trng’s leapfrogging version of assigning numbers from the stream of random numbers to each thread, but unfortunately the more natural way of combining this with the nested loop is to have threads work on columns of the grid. This is less efficient for cache memory use. You can try it to see it in action. It uses more numbers of the random stream by skipping some of them. Why this is happening is beyond the scope of the purpose of this chapter.

Calculating the start and stop rows per thread¶

The following code shows the function getStartStopRow used in the example above. Study it to see how the values are determined for a flattened 2D grid of particular length l.

#include <math.h>   // ceil()

/*
 * For 2D grids:
 *  Obtain the starting and stopping row values for each thread's
 *  block of rows when using blocking in trng's
 *  assignment of random numbers to threads.
 *
 * Libby Shoop, Macalester Colege, Fall 2025
 *  with special thanks to Rocky Slaymaker for inspiration
 *
 *
 *  @param: tid: thread ID
 *  @param: numThreads: total number of threads
 *  @param: length: total number of rows in the 2D grid
 *  @param: startRow: pointer to store starting row int
 *  @param: endRow: pointer to store ending row int
 *
 *  trng's jump() function can be used to skip ahead
 *  in the random number sequence, but it requires
 *  the number of random numbers to skip. This function
 *  helps compute that value for each thread.
 *
 */

void getStartStopRow(int tid, int numThreads, int length, int *startRow, int *endRow) {
     int rowsPerThread = length / numThreads;
     int extraRows = length % numThreads;

if (tid < extraRows) {
         *startRow = tid * (rowsPerThread + 1);
         *endRow  = *startRow + rowsPerThread +1;
     } else {
         *startRow = (tid * rowsPerThread) + extraRows;
         * endRow = *startRow + rowsPerThread;
     }

}

Helper function for printing¶

Obtaining command line arguments for this example¶

#include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <ctype.h>
 #include <string.h>  // C++ string comparison

#define LEAPFROG 0
 #define BLOCKSPLIT 1

void Usage2D(char *program) {
     fprintf(stderr, "This program demonstrates use of a loop to create a stream of random numbers in parallel.\n");
     fprintf(stderr, "Usage: %s [-h] [-t numThreads] [-n numReps][-c] [-d block|leapfrog]\n", program);
     fprintf(stderr, "   -h shows this message and exits.\n");
     fprintf(stderr, "   -t indicates number of threads to use.\n");
     fprintf(stderr, "   -w dim         : width of the grid (default: 8)\n");
     fprintf(stderr, "   -l dim         : vertical length of the grid (default: 8)\n");
     fprintf(stderr, "   -c indicates that a fixed seed will be used, resulting in the same stream of numbers each time this is run.\n");
     fprintf(stderr, "   -d indicates whether the trng generator will dole out numbers in blocks or in leapfrog fashion. default is leapfrog.\n");
 }

// Check a string as a number containing all digits
 int isNumber(char s[])
 {
     for (int i = 0; s[i]!= '\0'; i++)
     {
         if (isdigit(s[i]) == 0)
             return 0;
     }

return 1;
 }

// Called when isNumber() fails
     void exitWithError(char cmdFlag, char ** argv) {
     fprintf(stderr, "Option -%c needs a number value\n", cmdFlag);
     Usage2D(argv[0]);
     exit(EXIT_FAILURE);
 }

void invalidChoice(char cmdFlag, char ** argv) {
     fprintf(stderr, "unrecognized value for Option -%c needs 'block' or 'leapfrog'\n", cmdFlag);
     Usage2D(argv[0]);
     exit(EXIT_FAILURE);
 }

// Function to gather command line arguments for 2D loops and arrays
 void getArguments(int argc, char *argv[],
                 int * numThreads, int * w, int * l, int * useConstantSeed, int * doleOut)
 {

int c;        // result from getopt calls

// The : after a character means a value is expected
     // No colon means it is simply a flag with no associated value
     while ((c = getopt (argc, argv, "w:l:t:d:hc")) != -1) {

// getopt implicitly sets a value to a char * (string) called optarg
     // to what the user typed after -n
         switch (c)
         {
         // character string entered after the -w needs to be a number
         case 'w':
             if (isNumber(optarg)) {
             *w = atoi(optarg);
             } else {
             exitWithError(c, argv);
             }
             break;
         // character string entered after the -l needs to be a number
         case 'l':
             if (isNumber(optarg)) {
             *l = atoi(optarg);
             } else {
             exitWithError(c, argv);
             }
             break;

// character string entered after the -t needs to be a number
         case 't':
             if (isNumber(optarg)) {
             *numThreads = atoi(optarg);
             } else {
             exitWithError(c, argv);
             }
             break;

case 'd':
             if (strcmp(optarg, "block") == 0) {
             *doleOut = BLOCKSPLIT;
             } else if (strcmp(optarg, "leapfrog") == 0) {
             *doleOut = LEAPFROG;
             } else {
             invalidChoice(c, argv);
             }
             break;
         // If the -h is encountered, then we provide usage
         case 'h':
             Usage2D(argv[0]);
             exit(0);
             break;

// If the -c is encountered, then we change the constant seed flag
         case 'c':
             *useConstantSeed = 1;
             break;

case ':':
             printf("Missing arg for %c\n", optopt);
             Usage2D(argv[0]);
             exit(EXIT_FAILURE);
             break;

case '?':
             if (
                 (optopt == 'v') ||
                 (optopt == 't')
             )
             {
             Usage2D(argv[0]);
             exit(EXIT_FAILURE);
             } else if (isprint (optopt)) {
             fprintf (stderr, "Unknown option `-%c'.\n", optopt);
             Usage2D(argv[0]);
             exit(EXIT_FAILURE);
             } else {
             fprintf (stderr,
                     "Unknown non-printable option character `\\x%x'.\n",
                     optopt);
             Usage2D(argv[0]);
             exit(EXIT_FAILURE);
             }
             break;

}
     }
 }

You have attempted of activities on this page