# CHAPTER 5: Integration using the Trapezoidal Rule¶

## 5.1 Estimating the Area Under a Curve¶

As our next example, let’s look at the problem of estimating the area under a curve. If you have taken a calculus course, you may recognize this problem as the Riemann sum or Trapezoidal Rule, which approximates the area under the curve (i.e. the integral) by splitting the area under the curve into a series of trapezoids.

In the following programs, we attempt to use the trapezoid rule to approximate the integral

$\int_0^{\pi} sin(x)_{dx}$

using $$2^{20}$$ equal subdivisions. The answer from this computation should be 2.0. The following video shows how a single thread would solve this problem:

In this example, the single thread serially computes the area of each trapezoid and adds all the trapezoids together to one value.

A C implementation of this program may look like the following:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 #include #include #include //function we are calculating the area under double f(double x) { return sin(x); } //estimation of pi const double pi = 3.141592653589793238462643383079; int main(int argc, char** argv) { //Variables double a = 0.0, b = pi; //limits of integration int n = 1048576; //number of subdivisions = 2^20 double h = (b - a) / n; //width of each subdivision double integral; // accumulates answer integral = (f(a) + f(b))/2.0; //initial value //sum up all the trapezoids int i; for(i = 1; i < n; i++) { integral += f(a+i*h); } integral = integral * h; printf("With %d trapezoids, our estimate of the integral from \n", n); printf("%f to %f is %f\n", a,b,integral); return 0; } 

Let’s now consider how we can use multiple threads to approximate the area under the curve in parallel. One strategy would be to assign each thread a subset of the total set of subdivisions, so that each thread separately computes its assigned set of trapezoids.

The following video illustrates how 4 threads would work together to approximate the area under the curve:

## 5.2 Parallel Integration - First Attempt¶

Let’s visit the integration directory by using the following command (note that this directory is on the same level as the patternlets directory) in the reduction directory:

cd ../../integration


The following program attempts to update the serial version of the integration program with OpenMP pragmas:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 #include #include // printf() #include // atoi() #include // OpenMP /* Demo program for OpenMP: computes trapezoidal approximation to an integral*/ const double pi = 3.141592653589793238462643383079; int main(int argc, char** argv) { /* Variables */ double a = 0.0, b = pi; /* limits of integration */; int n = 1048576; /* number of subdivisions = 2^20 */ double h = (b - a) / n; /* width of subdivision */ double integral; /* accumulates answer */ int threadcnt = 1; double f(double x); /* parse command-line arg for number of threads */ if (argc > 1) { threadcnt = atoi(argv[1]); } #ifdef _OPENMP omp_set_num_threads( threadcnt ); printf("OMP defined, threadct = %d\n", threadcnt); #else printf("OMP not defined"); #endif integral = (f(a) + f(b))/2.0; int i; #pragma omp parallel for \ private(i) shared (a, n, h, integral) for(i = 1; i < n; i++) { integral += f(a+i*h); } integral = integral * h; printf("With %d trapezoids, our estimate of the integral from \n", n); printf("%f to %f is %f\n", a,b,integral); } double f(double x) { return sin(x); } 

Lines 36-37 contain the omp parallel for pragma that parallelizes the loop that is computing the integral. This example is different from others you have seen so far, in that the variables are explicitly declared to be either private (each thread gets its own copy) or shared (all threads use the same copy of the variable in memory).

If our parallel program is implemented correctly, the program should estimate the area under the curve as 2.00, which would be identical to the output of the serial program.

First, run this program serially on 1 thread by typing

make
./trap-notworking 1


Next run this code in parallel using 4 threads by typing:

make
./trap-notworking 4


As with other examples you have done so far, the command-line argument of 4 above indicates use four threads.

## 5.3 Parallel Integration - Second Attempt¶

We will use the previously discussed reduction clause to fix the issue. Here is the improved version, called trap-omp-working.c:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #include #include // printf() #include // atoi() #include // OpenMP /* Demo program for OpenMP: computes trapezoidal approximation to an integral*/ const double pi = 3.141592653589793238462643383079; int main(int argc, char** argv) { /* Variables */ double a = 0.0, b = pi; /* limits of integration */; int n = 1048576; /* number of subdivisions = 2^20 */ double h = (b - a) / n; /* width of subdivision */ double integral; /* accumulates answer */ int threadcnt = 1; double f(double x); /* parse command-line arg for number of threads */ if (argc > 1) { threadcnt = atoi(argv[1]); } #ifdef _OPENMP omp_set_num_threads( threadcnt ); printf("OMP defined, threadct = %d\n", threadcnt); #else printf("OMP not defined"); #endif integral = (f(a) + f(b))/2.0; int i; #pragma omp parallel \ for private(i) shared (a, n, h) reduction(+: integral) for(i = 1; i < n; i++) { integral += f(a+i*h); } integral = integral * h; printf("With %d trapezoids, our estimate of the integral from \n", n); printf("%f to %f is %f\n", a,b,integral); } double f(double x) { return sin(x); } 

As with the previous reduction patternlet example, we again have an accumulator variable in the loop, this time called integral. Thus our reduction clause reads reduction(+:integral).

The above program was compiled with our first use of the make command. Run the above program using the following command:

./trap-working 4


Note

Because we did not explicitly add an additional static or dynamic clause to the pragma on lines 37-38 on the working version above, the default behavior of assigning equal amounts of work doing consecutive iterations of the loop (i.e. static scheduling) was used to decompose the problem onto threads. In this case, since the number of trapezoids used was 1048576, then with 4 threads, thread 0 will do the first 1048576/4 trapezoids, thread 1 the next 1048576/4 trapezoids, and so on.