The unified diff between revisions [7ad03227..] and [61e3fdd8..] is displayed below. It can also be downloaded as a raw diff.
# # # add_file "aca319/lab6.txt" # content [a257dd5c0110e9e4b4f3105999938600a892cc05] # ============================================================ --- aca319/lab6.txt a257dd5c0110e9e4b4f3105999938600a892cc05 +++ aca319/lab6.txt a257dd5c0110e9e4b4f3105999938600a892cc05 @@ -0,0 +1,114 @@ + +ACA 319 +Laboratory Report: Lab 6. + +Author: Grahame Bowland +Student Number: 9902827 + +1. Compare both the run-time (speedup) and result (correctness) with +the sequential version and the parallel version. + +The sequential version of my program is attached as "lab6q1.c". The +parallel version is attached as "lab6q2.c". Run times are as follows: + +Program Nodes Runtime +-----------*------------*--------------- +lab6q1.c | | 8.59s +lab6q2.c | 3 | 10.00s +lab6q2.c | 4 | 10.57s +lab6q2.c | 5 | 10.05s +lab6q2.c | 6 | 10.30s +lab6q2.c | 7 | 10.50s + +The output of both programs appears to be the same. The hot spot can +be seen to "warm" neighbouring nodes, while the ice has an overwhelming +effect to cool nodes not neighbouring on the hot spot. + +The distributed version of the program operates by dividing the task +up into rows. Each slave is assigned a particular group of rows to +calculate. Each time the graph is iterated, the following occur; + - each slave receives the rows that neighbour on it (above and + below) from the master process. These rows represent a data- + dependency, as the algorithm for each point in the edge rows + necessarily includes the rows above and below. + - each slave calculates the temperature gradient within its + assigned area + - each slaves sends its result back to the master + - the master then outputs a PGM image + +The program does not speed up as the number of nodes are increased. I +believe the problem is that the problem is relatively simple to solve +and thus the message passing overhead is excessive. Hence the distributed +version of my program is slower than the single processor version. + +There are two clear reasons that the message passing overhead is +so high: + - each slave is data dependent on its neighbours. This means that + each iteration must be calculated in lock-step. + - each slave must transmit its result back to the master at every + iteration. This is a significant transfer. + +In previous labs it was observed that MPI is a high latency communications +system. The size of the transfer is less important than the number of +transfers occuring. Unfortunately to calculate 1000 iterations on the +order of 3000 messages are required. + +2. How do you think the temperature gradient, work load for the slaves +will vary when; + +(a) using 4-connected neighbours + +This is the algorithm we implemented. The temperature from the hot +spot spreads out over neighbouring tiles, with the ice cooling all +tiles not near enough the hot-spot. The graph settles to an equilibrium. + +(b) using 8-connected neighbours + +In terms of data dependency in my distributed implementation, this +change would not affect the amount of data that would need to be +exchanged between slaves. There would be a small amount more integer +maths (as eight nodes are being averaged rather than four) but this +would be unlikely to significantly slow the program. + +The temperature gradient would settle to an equilibrium more +readily, as the hot point would affect all eight of its neighbours +directly and from there on. + +(c) scanning the image from bottom-up + +This would emphasise the effect of the ice - its cooling effect +would flow upwards very quickly. The hot-spot would heat surrounding +elements more slowly. + +This algorithm would be hard to implement in a distributed fashion +as it is by definition iterative. Each row must be calculated in +order. The individual cells in a row could possibly be split +between slaves, however the message passing overhead would be +excessive. + +(d) radiating outward from the hot-spot + +This is very hard to implement in a distributed fashion using double +buffering. Iterating outward from the hot-spot creates a large number +of data dependencies, and makes it very difficult to split the task +up into large areas of work which can be distributed over slave +processes. This would cause the slaves to spend a lot of time message +passing, and little time doing work. + +The effect of this algorithm on the graph would be to emphasise the +hot-spot. The ice would have minimal effect as its values would be +the last calculated. + +(e) scanning the entire image first, determining the next values, and only + then update the pixels + +This is the same as my implementation using 4-connected neighbours. + +3. Attach event logging for experiment 2 with 'upshot' graphs with your +report. + +The upshot graph is attached. The lines for all slave nodes are very +faint as they are calculating 1000 intervals. It can be seen from the +graph that the program is operating in parallel as designed. + +