# CHAPTER 4: Parallel Sum¶

Very often, loops are used with an accumulator variable to compute a a single value from a set of values, such as the sum
of integers in an array or list. Fortunately, OpenMP implements a special parallel pattern known as *reduction*, which will
aid us in this process. The reduction pattern is one of a group of patterns called **collective communication** patterns
because the threads must somehow work together to create the final desired single value.

## 4.1 An Initial Attempt¶

From the `ParallelLoopChunksOf1`

directory, let’s enter the `reduction`

directory by typing in the following command:

```
cd ../reduction
```

Running `ls`

in this new directory should reveal the following two files:

`Makefile`

`reduction.c`

The code in `reduction.c`

is fairly long, so we will discuss it in parts:

```
17#include <stdio.h> // printf()
18#include <omp.h> // OpenMP
19#include <stdlib.h> // rand()
20
21void initialize(int* a, int n);
22int sequentialSum(int* a, int n);
23int parallelSum(int* a, int n);
24
25#define SIZE 1000000
26
27int main(int argc, char** argv) {
28 int array[SIZE];
29
30 if (argc > 1) {
31 omp_set_num_threads( atoi(argv[1]) );
32 }
33
34 initialize(array, SIZE);
35 printf("\nSeq. sum: \t%d\nPar. sum: \t%d\n\n",
36 sequentialSum(array, SIZE),
37 parallelSum(array, SIZE) );
38
39 return 0;
40}
```

The `main()`

function in `reduction.c`

first initializes an array of 1 million random integers.
The `main()`

function then executes two separate functions: `sequentialSum()`

(which sums up an array sequentially),
and `parallelSum()`

which attempts to sum up the array in parallel.

For simpliicty, we next show only the `parallelSum()`

function:

```
60/* sum the array using multiple threads */
61int parallelSum(int* a, int n) {
62 int sum = 0;
63 int i;
64// #pragma omp parallel for // reduction(+:sum)
65 for (i = 0; i < n; i++) {
66 sum += a[i];
67 }
68 return sum;
69}
```

In its current form, the function is *identical* to how the `sequentialSum()`

function works. However, line 64 contains
some commented out code that should parallelize this function. For now, let’s keep line 64 commented out.

### Try It Out¶

First, let’s compile and run the program using the following command:

```
make
./reduction 4
```

- The sequential sum and parallel sum functions output the same sum.
- Correct! However, the reason why should be obvious; at this point, BOTH functions are running the code serially!
- The sequential sum and parallel sum functions output different sums.
- Not correct; did you ensure that you didn't change the code? Remember, at this point, we are running the code with line 64 still commented out.

Q-1: Run the code a few times using 4 threads as the second parameter. What do you see?

## 4.2 Edit, Build and Run in Parallel¶

Let’s next edit the `parallelSum()`

function so that it executes in parallel by removing `//`

comment at the front of line 64.
Be sure to ONLY remove the first `//`

(keep the reduction clause commented out for now).

Recompile and rerun the program using the following command:

```
make
./reduction 4
```

- The sequential sum and parallel sum functions output the same sum.
- Not quite; try recompiling (run "make") and running the program a few times. It is always the same sum?
- The sequential sum and parallel sum functions output different sums.
- Correct! It looks the "omp parallel for" pragma is not enough in this case!

Q-2: Run the code a few times using 4 threads. What do you see?

Line 64 of the `parallelSum`

function currently contains a (commented out) special clause that can be used with
the `omp parallel for`

pragma called `reduction`

. The notion of a reduction comes from the mathematical operation
*reduce*, in which a collection of values are combined into a single value via a common mathematical function.
Summing up a collection of values is therefore a natural example of reduction.

In this program, the `reduction(+:sum)`

clause indicates that a summation is being performed (the `+`

identifier)
on the the variable `sum`

, which acts as the accumulator.

- The sequential sum and parallel sum functions output the same sum.
- Correct! The program now correctly computes the sum in parallel.
- The sequential sum and parallel sum functions output different sums.
- Did you remember to uncomment the reduction clause on line 64? Did you recompile using "make"?

Q-3: Remove the second // on line 64, and recompile. Re-run the code a few times using 4 threads. What do you see?

## 4.3 Something to think about¶

Do you have any ideas about why the `omp parallel for`

pragma without the reduction clause did not produce the correct
result?

- Accumulators in loops are something that cannot be parallelized properly.
- Nope. This program CAN be properly parallelized. We are just doing something wrong!
- The program is losing values or overwriting values somewhere (i.e. there is a race condition)
- Correct!
- Fairies have infested the computer and are wrecking havoc as we speak.
- Nope! Thankfully, the Raspberry Pi is fairy-proof. :)
- It's some other issue.
- Actually, the issue is listed as one of the options!

Q-4: What is going on?

The sum example that we presented in this chapter was a fairly small example. In the next two chapters, we will explore much larger examples.