4.1. Message Passing Pattern: Key Problem

The following code represents a common error that many programmers have inadvertently placed in their code. The concept behind this program is that we wish to use communication between pairs of processes, like this:

../_images/pair_exchange.png

For message passing to work between a pair of processes, one must send and the other must receive. If we wish to exchange data, then each process will need to perform both a send and a receive. The idea is that process 0 will send data to process 1, who will receive it from process 0. Process 1 will also send some data to process 0, who will receive it from process 1. Similarly, processes 2 and 3 will exchange messages: process 2 will send data to process 3, who will receive it from process 2. Process 3 will also send some data to process 2, who will receive it from process 3.

If we have more processes, we still want to pair up processes together to exchange messages. The mechanism for doing this is to know your process id. If your id is odd (1, 3 in the above diagram), you will send and receive from your neighbor whose id is id - 1. If your id is even (0, 2), you will send and receive from your neighbor whose id is id + 1. This should work even if we add more than 4 processes, as long as the number of processes is divisible by 2.

Warning

There is a problem with the following code called deadlock. This happens when every process is waiting on an action from another process. The program cannot complete. On linux systems such as the Raspberry Pi, type ctrl-c together to stop the program (ctrl means the control key).

Navigate to: ../04.messagePassingDeadlock/

Make and run the code on 4 MPI processes:

make

mpirun -hostfile ~/hostfile -np 4 ./messagePassingDeadlock

Here the 4 signifies the number of processes to start up in MPI.

Exercise:

4.1.1. Explore the code

In this code, can you trace what is happening to cause the deadlock?

#include <stdio.h>
#include <mpi.h>

int odd(int number) { return number % 2; }

int main(int argc, char** argv) {
    int id = -1, numProcesses = -1;
    int sendValue = -1, receivedValue = -1;
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcesses);

    if (numProcesses > 1) {
        sendValue = id;
        if ( odd(id) ) {  // odd processors receive from their 'left neighbor', then send
            MPI_Recv(&receivedValue, 1, MPI_INT, id-1, 2,
                       MPI_COMM_WORLD, &status);
            MPI_Send(&sendValue, 1, MPI_INT, id-1, 1, MPI_COMM_WORLD);

        } else {          // even processors receive from their 'right neighbor', then send
            MPI_Recv(&receivedValue, 1, MPI_INT, id+1, 1,
                       MPI_COMM_WORLD, &status);
            MPI_Send(&sendValue, 1, MPI_INT, id+1, 2, MPI_COMM_WORLD);
        }

        printf("Process %d of %d computed %d and received %d\n",
                id, numProcesses, sendValue, receivedValue);
    } else if ( !id) {  // only process 0 does this part
        printf("\nPlease run this program using -np N where N is positive and even.\n\n");
    }

    MPI_Finalize();
    return 0;
}

Can you think of how to fix this problem?

Go to the next example to see the solution.

You have attempted of activities on this page