3.3.1 Unequal blocks of RNs to match unequal chunks in loops

The method of splitting the stream of random numbers into blocks that are used by each thread is quite useful, but as we saw with our previous examples, it can behave unexpectedly if the needed numbers in the stream are not evenly divided by the number of threads or processes we use. Luckily, we can get around this fairly easily by deciding for ourselves what starting position in the stream each thread should use.

Let’s consider this example stream of numbers:

../_images/Stream-8-RNs.png

Now suppose that we would like to use the block splitting method with three threads. We had a problem with the technique that we used before because the blocks would not be of equal size. However, we can get around this and be able to get this assignment of the numbers to three threads:

../_images/Stream-8-RNs-block-3threads.png

We do this by using the following helper function, getChunkStartStopValues, to decide where the start and stop indices of a block should be, given the thread id, the total number of threads, and the number of values in the stream.

With this function we can now provide a correct starting index in the stream to the trng generator’s jump function used for splitting the stream into blocks, like this:

unsigned start, stop;
getChunkStartStopValues(tid, numThreads, (const unsigned)repetitions,
                        &start, &stop);

randGen.jump(start); // block split slightly unevenly

In this code snippet above, the number of repetitions of the loop is the number of random numbers we need to generate.

The complete example for looping and creating random numbers is given below. It is the same as the previous “equal chunks” example, except for this above change in how we use the jump function. In the previous example, we simply did the following, which assumes the repetitions is divisible by the number of threads:

randGen.jump(tid * (repetitions / numThreads)); // block split

By calculating a different starting index in the stream of random numbers per each thread, we can accurately use the blocking assignment of those numbers per thread.

For the following example, we also still use the same command line arguments:

-t indicates number of threads to use.
-n indicates the number of repetitions of the loop (default is 8).
-c indicates that a fixed seed will be used, resulting in the same
   stream of numbers each time this is run.
-d indicates whether the trng generator will dole out numbers in
   block or in leapfrog fashion. (default is leapfrog).

TO DO:

Scroll to the bottom of this code example near line 146 and notice the simplest way of using the OpenMP for pragma inside this parallel block in front of a loop written just the way we normally would for a sequential version. By adding the function to determine where each block of random numbers in the stream should start, we can use the block splitting method of obtaining numbers from the stream.

To illustrate this further, suppose you try increasing the number of repetitions to 10 by changing the -n value above. Then use 4 threads. You should see that the assignment of loop indexes and indexes into the stream of numbers should look like this:

../_images/Stream-10-RNs-block-4threads.png

Note

So we now have the capability of matching the slightly unequal decomposition of the parallel for loop by OpenMP to the decomposition of the blocks using the block splitting technique. In the next section, we will see a use of this to place the random numbers into an array. Following that, we will see an even more important use of this: 2-dimensional grid structures with nested loops.

Command line code for reference

This code block below has code for handling the command line arguments for the parallel versions in this subsection and the previous one. It hasn’t changed.

You have attempted of activities on this page