5.1 Vector Addition Example: exploring CUDA by timing and experimentingΒΆ

In this example we will look at these new aspects of CUDA coding, using the same Vector Addition example as the previous chapter.

The code examples contain experimental timings for the following conditions in the following sections of this chapter:

Section 5.2. Case 1. Running on a single thread on the host CPU.

Section 5.2. Case 2. Running on a single thread on the GPU device (not something we would normally do, but given to show the difference between CPU and GPU cores).

Section 5.3, 5.4. Case 3. Running on a single block of threads (grid size 1).

Section 5.3, 5.4. Case 4. Running on a somewhat small number of blocks, using a slightly different version of the loop to perform the addition.

Section 5.3, 5.4. Case 5. Running on a large number of blocks, as shown in the previous example.


Taking timings and running experiments as shown in the next few sections is central to the work process during PDC computing. Just as we need to make sure our code is still correct, we also want to determine the best way to run the code to get the best performance and what factors do not really affect the performance.

You have attempted of activities on this page