2.9 Real World Example - Forest Fire Simulation

Ported to Python by Libby Shoop (Macalester College), from the original [Shodor foundation C code example](http://www.shodor.org/refdesk/Resources/Tutorials/BasicMPI/).

../_images/fire_pic.jpg

The goal is to simulate a forest fire using an N x N grid of trees. If a tree starts to smolder, each tree around it has some chance of catching fire. The model follows the following set of rules (from Shodor):

The main input parameters for the model are:

The simulation of one tree burning starts with the tree in the center of the grid smoldering, and all other trees alive.

The main outputs for the single fire model are:

The fire functions

There are several functions that a single fire simulations uses, and they are in the following code block for reference.

A single fire burning

The first command line argument in the code block above represents the length of one row, N, of an NxN forest.

The second argument is the probability threshold of spreading. In the forest_burns() function, if a tree is burning and its neighbor is not, a random number between 0 and 1.0 is generated for whether the neighbor will catch fire. If the number is less than the probability threshold, the tree is set to smolder.

The output of running the above single fire shows an approximation of how long it takes for the fire to burn out by counting the number of times the function forest_burns() executes. This is given as the iterations until the fire burns out.

The next part of the output shows the percentage of trees in the forest burned on this particular trial.

Lastly, a textual output of what the final forest looks like is shown, where a ‘Y’ is a live tree and a ‘.’ is a dead tree.

Warning

In this case, this program should run on only one process, so please do not change the -np mpirun flag.

Exercises

One single instance of this one-fire model can produce a different result each time it is run because of the randomness of the probability of unburnt trees catching fire.

  1. Try running this several times at the same 0.5 threshold. Notice how the iterations and the percent burned changes.

  2. Try varying the threshold from 0.2 to 1.0 by 0.1 increments, running it several times at each threshold.

Note

Generally, what you should have seen is that as the probability threshold increases, the average percent burned of your trials increases, as does the average number of iterations. This makes sense from the actual fire burning: if trees are more likely to catch fire (the threshold is higher), then more of them should burn and it should take longer for the fire to settle out.

A full simulation on one process

Each time the previous code is run on one forest, the result is different. In addition, even if we ran several trials, the resulting percent of trees burned and number of iterations before the fire burned out on average would be different, depending on the input probability threshold. Because of this, a more realistic simulation requires that many instances of the above single simulation be run in this way:

  • Keep the size of the grid of trees the same.

  • Start with a low probability threshold.

  • Run a given number of trials at a fixed probability threshold.

  • Repeat for another probability threshold, some increment larger than the previous one, until the threshold is 1.0.

This simulation of multiple trials at a range of different probability thresholds has a known interesting output, which can be graphed as follows:

../_images/forest_fire_simulation_with_multiple_trials_1_process.png

In this case, we ran 20 trials on a single Raspberry Pi 3B, with the probability threshold starting at 0.1 and incrementing by 0.1.

As the size of the grid changes and the probability points increase, this curve will look roughly the same, although it should get smoother as the number of trials increases and the increment value is smaller. But these more accurate simulations take a long time to run.

There are a couple of functions that a full simulation uses, and they are in the following code block for reference.

Now here is the code for the single processor version of the simulation. It starts at a probability threshold of 0.1, then runs a set of trials using that threshold. Then it increases the threshold by a given increment amount and runs another set of trials. This is repeated until the threshold value is just below 1.0. For example, if the threshold increment is given as 0.1, the set of thresholds run will be:

0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9

It takes three arguments in this order:

  1. size of a row in a square forest of trees

  2. amount to increment for each new probability threshold

  3. number of trials to run at each threshold

Warning

In this case, this program should run on only one process, so please do not change the -np mpirun flag.

Exercise

Run this with the default command line arguments given, which are 20x20 forest, increment probability threshold every 0.1, and run 10 trials. Note the time. Double the number of trials to 20 and note the time.

At this point if you increase the size of the forest or try to run more trials or more probabilities, the service that runs this code for you will time out.

Note

The message here is that running a simulation that would try to use a larger forest or get more accurate results (more trials, smaller probability threshold increments) is difficult to do with one processor.

The parallel MPI version

The desired outcome of the parallel version is to also produce data of average percent burns as a function of probability of spreading, as quickly and as accurately as possible. This should take into account that the probability of the fire spreading will affect not only how long it takes for the fire to burn out but also the number of iterations required to reach an accurate representation of the average.

If we put more processes to work on the problem, we should be able to complete a more accurate simulation in less time than the sequential version. Even the same problem as above can generate similar results running on 4 processes in almost 1/4 of the time.

The parallelization happens by splitting up the number of trials to be run among the processes. Each process completes the range of probabilities for its portion of the trials, sending the results back to the conductor process.

Exercises

  1. Run the above with the defaults and compare the running time to the previous sequential example with the same settings. Is the time for 4 processes close to \(1/4\) the time using 1?

  2. Try 40 trials, which should give us slightly more accurate averages.

  3. To get more data and increase the trials, try using this for the command line arguments: [‘20’, ‘0.05’, ‘60’] and set the flags for mpirun to [-np 16’] to use 16 processes. Note how you can get more data using 16 processes in about the same time as the sequential version.

Try some other cases to observe how it scales

Ideally, as you double the number of workers on the same problem, the time should be cut in half. This is called strong scalability. But there is some overhead from the message passing, so we don’t often see perfect strong scalability.

Try running these tests and jot down your time values:

-np

tree row size

probability increment

number of trials

running time

2

20

0.1

40

4

20

0.1

40

8

20

0.1

40

16

20

0.1

40

What do you observe about the time as you double the number of processes?

When does the message passing cause the most overhead, which adds to the running time?

Try some other cases of your own design.

You have attempted of activities on this page