Barrier Synchronization and Timing MPI Code¶

16. The Barrier Synchronization Pattern¶

A barrier is used when you want all the processes to complete a portion of code before continuing. Use this exercise to verify that it is occurring when you add the call to the MPI_Barrier function. After adding the barrier call, the BEFORE strings should all be printed prior to all of the AFTER strings. You can visualize the execution of the program with the barrier function like this, with time moving from left to right:

/* barrier.c
  *  ... illustrates the behavior of MPI_Barrier() ...
  *
  */

#include <stdio.h>   // printf()
  #include <mpi.h>     // MPI

/* Have workers send messages to the conductor, which prints them.
  * @param: id, an int
  * @param: numProcesses, an int
  * @param: hostName, a char*
  * @param: position, a char*
  *
  * Precondition: this function is being called by an MPI process
  *               && id is the MPI rank of that process
  *               && numProcesses is the number of processes in the computation
  *               && hostName points to a char array containing the name of the
  *                    host on which this MPI process is running
  *               && position points to "BEFORE" or "AFTER".
  *
  * Postcondition: each process whose id > 0 has sent a message to process 0
  *                     containing id, numProcesses, hostName, and position
  *                && process 0 has received and output each message.
  */

#define BUFFER_SIZE 200
  #define CONDUCTOR      0

void sendReceivePrint(int id, int numProcesses, char* hostName, char* position) {
      char buffer[BUFFER_SIZE] = {'\0'};;
      MPI_Status status;

if (id != CONDUCTOR) {
          // Worker: Build a message and send it to the Conductor
          int length = sprintf(buffer,
                                "Process #%d of %d on %s is %s the barrier.\n",
                                  id, numProcesses, hostName, position);
          MPI_Send(buffer, length+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
      } else {
          // Conductor: Receive and print the messages from all Workers
          for(int i = 0; i < numProcesses-1; i++) {
            MPI_Recv(buffer, BUFFER_SIZE, MPI_CHAR, MPI_ANY_SOURCE,
                      MPI_ANY_TAG, MPI_COMM_WORLD, &status);
            printf("%s", buffer);
          }
      }
  }

int main(int argc, char** argv) {
      int id = -1, numProcesses = -1, length = -1;
      char myHostName[MPI_MAX_PROCESSOR_NAME] = {'\0'};

MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &id);
      MPI_Comm_size(MPI_COMM_WORLD, &numProcesses);
      MPI_Get_processor_name (myHostName, &length);

sendReceivePrint(id, numProcesses, myHostName, "BEFORE");

//    MPI_Barrier(MPI_COMM_WORLD);

sendReceivePrint(id, numProcesses, myHostName, "AFTER");

MPI_Finalize();
      return 0;
  }

To do:

Run the program several times, noting the interleaved outputs.
Uncomment the MPI_Barrier() call; then rerun, noting how the output changes.
Explain what effect MPI_Barrier() has on process behavior.

17. Timing code using the Barrier Coordination Pattern¶

The primary purpose of this exercise is to illustrate that one of the most practical uses of a barrier is to ensure that you are getting legitimate timings for your code examples. By using a barrier, you ensure that all processes have finished before recording the time using the conductor process. If a process finishes before all processes have completed their portion, the process must wait as indicated in green in the diagram below. Thus, the parallel execution time is the time it took the longest process to finish.

In the following code, note how we have artificially made the time for each process different.

/* barrier+timing.c
  *  ... illustrates the behavior of MPI_Barrier()
  *       to coordinate process-timing.
  *
  */

#include <stdio.h>   // printf()
  #include <mpi.h>     // MPI
  #include <unistd.h>  // sleep()

#define  CONDUCTOR 0

/* answer the ultimate question of life, the universe,
  *  and everything, based on id and numProcs.
  * @param: id, an int
  * @param: numProcs, an int
  * Precondition: id is the MPI rank of this process
  *             && numProcs is the number of MPI processes.
  * Postcondition: The return value is 42.
  */
  int solveProblem(int id, int numProcs) {

sleep( ((double)id+1) / numProcs);

return 42;
  }

int main(int argc, char** argv) {
      int id = -1, numProcesses = -1;
      double startTime = 0.0, totalTime = 0.0;
      int answer = 0.0;

MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &id);
      MPI_Comm_size(MPI_COMM_WORLD, &numProcesses);

//    MPI_Barrier(MPI_COMM_WORLD);
      if ( id == CONDUCTOR) {
          startTime = MPI_Wtime();
      }

answer = solveProblem(id, numProcesses);

//    MPI_Barrier(MPI_COMM_WORLD);
      if ( id == CONDUCTOR ) {
          totalTime = MPI_Wtime() - startTime;
          printf("\nThe answer is %d; computing it took %f secs.\n\n",
                    answer, totalTime);
      }

MPI_Finalize();
      return 0;
  }

To do:

Run with and without the barrier function call commented out.
Run the code several times and determine the average, median, and minimum execution time when the code has a barrier and when it does not. You could use a spreadsheet for this.
Without the barrier, what process is being timed?

18. Timing code using the Reduction pattern¶

We can also use reduction for obtaining the parallel execution time of a program. In this example, each process individually records how long it took to finish. Each of these local times is then reduced to a single time using the max operator. This allows us to find the largest local time from all processes.

/* reduce+timing.c
  *  ... illustrates the behavior of MPI_Barrier()
  *       to coordinate process-timing.
  */

#include <stdio.h>   // printf()
  #include <mpi.h>     // MPI
  #include <unistd.h>  // sleep()

#define  CONDUCTOR 0

sleep( ((double)id+1) / numProcs);

return 42;
  }

int main(int argc, char** argv) {
      int id = -1, numProcesses = -1;
      double startTime = 0.0, localTime = 0.0, totalTime = 0.0;
      int answer = 0.0;

MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &id);
      MPI_Comm_size(MPI_COMM_WORLD, &numProcesses);

MPI_Barrier(MPI_COMM_WORLD);
      startTime = MPI_Wtime();

answer = solveProblem(id, numProcesses);

localTime = MPI_Wtime() - startTime;
      MPI_Reduce(&localTime, &totalTime, 1, MPI_DOUBLE,
          MPI_MAX, 0, MPI_COMM_WORLD);

if ( id == CONDUCTOR ) {
          printf("\nThe answer is %d; computing it took %f secs.\n\n",
                    answer, totalTime);
      }

MPI_Finalize();
      return 0;
  }

To do:

Run the program five times
In a spreadsheet, compute the average, median, and minimum of the five times.
Explain behavior of MPI_Reduce() in terms of localTime and totalTime.
Compare results to results from previous barrier+timing

You have attempted of activities on this page