# CHAPTER 5: Integration using the Trapezoidal Rule¶

## 5.1 Estimating the Area Under a Curve¶

As our next example, let’s look at the problem of estimating the area under a curve. If you have taken a calculus course, you may recognize this problem as the Riemann sum or Trapezoidal Rule, which approximates the area under the curve (i.e. the integral) by splitting the area under the curve into a series of trapezoids.

In the following programs, we attempt to use the trapezoid rule to approximate the integral

using \(2^{20}\) equal subdivisions. The answer from this computation should be 2.0. The following video shows how a single thread would solve this problem:

In this example, the single thread serially computes the area of each trapezoid and adds all the trapezoids together to one value.

A C implementation of this program may look like the following:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ```
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
//function we are calculating the area under
double f(double x) {
return sin(x);
}
//estimation of pi
const double pi = 3.141592653589793238462643383079;
int main(int argc, char** argv) {
//Variables
double a = 0.0, b = pi; //limits of integration
int n = 1048576; //number of subdivisions = 2^20
double h = (b - a) / n; //width of each subdivision
double integral; // accumulates answer
integral = (f(a) + f(b))/2.0; //initial value
//sum up all the trapezoids
int i;
for(i = 1; i < n; i++) {
integral += f(a+i*h);
}
integral = integral * h;
printf("With %d trapezoids, our estimate of the integral from \n", n);
printf("%f to %f is %f\n", a,b,integral);
return 0;
}
``` |

Let’s now consider how we can use multiple threads to approximate the area under the curve in parallel. One strategy would be to
assign each thread a *subset* of the total set of subdivisions, so that each thread separately computes its assigned set of trapezoids.

The following video illustrates how 4 threads would work together to approximate the area under the curve:

## 5.2 Parallel Integration - First Attempt¶

Let’s visit the `integration`

directory by using the following command (note that this directory is on the same level as the `patternlets`

directory) in the `reduction`

directory:

```
cd ../../integration
```

The following program attempts to update the serial version of the integration program with OpenMP pragmas:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ```
#include <math.h>
#include <stdio.h> // printf()
#include <stdlib.h> // atoi()
#include <omp.h> // OpenMP
/* Demo program for OpenMP:
computes trapezoidal approximation to an integral*/
const double pi = 3.141592653589793238462643383079;
int main(int argc, char** argv) {
/* Variables */
double a = 0.0, b = pi; /* limits of integration */;
int n = 1048576; /* number of subdivisions = 2^20 */
double h = (b - a) / n; /* width of subdivision */
double integral; /* accumulates answer */
int threadcnt = 1;
double f(double x);
/* parse command-line arg for number of threads */
if (argc > 1) {
threadcnt = atoi(argv[1]);
}
#ifdef _OPENMP
omp_set_num_threads( threadcnt );
printf("OMP defined, threadct = %d\n", threadcnt);
#else
printf("OMP not defined");
#endif
integral = (f(a) + f(b))/2.0;
int i;
#pragma omp parallel for \
private(i) shared (a, n, h, integral)
for(i = 1; i < n; i++) {
integral += f(a+i*h);
}
integral = integral * h;
printf("With %d trapezoids, our estimate of the integral from \n", n);
printf("%f to %f is %f\n", a,b,integral);
}
double f(double x) {
return sin(x);
}
``` |

Lines 36-37 contain the `omp parallel for`

pragma that parallelizes the loop that is computing the
integral. This example is different from others you have seen so far, in that the variables are explicitly declared to be
either `private`

(each thread gets its own copy) or `shared`

(all threads use the same copy of the variable in memory).

If our parallel program is implemented correctly, the program should estimate the area under the curve as 2.00, which would be identical to the output of the serial program.

First, run this program serially on 1 thread by typing

```
make
./trap-notworking 1
```

- The program works as expected (outputs answer 2.0)
- Correct! This program runs serially, so there is no issue.
- The program returns a different approximate value (but the same value with every run)
- Not quite. Did you run the program using 1 thread?
- The program returns a different approximate value every run.
- Not quite. Did you run the program using 1 thread?

Q-3: Run the OpenMP implementation of integration serially a few times. What do you discover?

Next run this code in parallel using 4 threads by typing:

```
make
./trap-notworking 4
```

As with other examples you have done so far, the command-line argument of `4`

above indicates use four threads.

- The program works as expected (outputs answer 2.0)
- Not quite. Try running the program a few more times. What do you see?
- The program returns a different approximate value (but the same value with every run)
- Close, but not correct. Try running the program again. Do you really get the same answer?
- The program returns a different approximate value every run.
- Correct. Something is seriously wrong with our program!

Q-4: Run the OpenMP implementation of integration a few times. What do you discover?

- This scenario illustrates a classic limitation of certain types of parallelism; this is ane example of something that cannot be properly parallelized.
- Nope. This program CAN be properly parallelized. We are just doing something wrong!
- There is a race condition in this code.
- Correct! Do you know where it is?
- It's some other issue.
- Actually, the issue is listed in one of the options!

Q-5: Do you have any guesses on what could be causing the issue?

## 5.3 Parallel Integration - Second Attempt¶

We will use the previously discussed `reduction`

clause to fix the issue. Here is the improved version, called `trap-omp-working.c`

:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ```
#include <math.h>
#include <stdio.h> // printf()
#include <stdlib.h> // atoi()
#include <omp.h> // OpenMP
/* Demo program for OpenMP:
computes trapezoidal approximation to an integral*/
const double pi = 3.141592653589793238462643383079;
int main(int argc, char** argv) {
/* Variables */
double a = 0.0, b = pi; /* limits of integration */;
int n = 1048576; /* number of subdivisions = 2^20 */
double h = (b - a) / n; /* width of subdivision */
double integral; /* accumulates answer */
int threadcnt = 1;
double f(double x);
/* parse command-line arg for number of threads */
if (argc > 1) {
threadcnt = atoi(argv[1]);
}
#ifdef _OPENMP
omp_set_num_threads( threadcnt );
printf("OMP defined, threadct = %d\n", threadcnt);
#else
printf("OMP not defined");
#endif
integral = (f(a) + f(b))/2.0;
int i;
#pragma omp parallel \
for private(i) shared (a, n, h) reduction(+: integral)
for(i = 1; i < n; i++) {
integral += f(a+i*h);
}
integral = integral * h;
printf("With %d trapezoids, our estimate of the integral from \n", n);
printf("%f to %f is %f\n", a,b,integral);
}
double f(double x) {
return sin(x);
}
``` |

As with the previous reduction patternlet example, we again have an accumulator variable in the loop, this time called
`integral`

. Thus our reduction clause reads `reduction(+:integral)`

.

The above program was compiled with our first use of the `make`

command. Run the above program using the following command:

```
./trap-working 4
```

Note

Because we did not explicitly add an additional `static`

or `dynamic`

clause to the pragma on lines 37-38 on the working
version above, the default behavior of assigning equal amounts of work doing consecutive iterations of the loop (i.e. static
scheduling) was used to decompose the problem onto threads. In this case, since the number of trapezoids used was 1048576, then
with 4 threads, thread 0 will do the first 1048576/4 trapezoids, thread 1 the next 1048576/4 trapezoids, and so on.