The unified diff between revisions [7ad03227..] and [61e3fdd8..] is displayed below. It can also be downloaded as a raw diff.

#
#
# add_file "aca319/lab6.txt"
#  content [a257dd5c0110e9e4b4f3105999938600a892cc05]
#
============================================================
--- aca319/lab6.txt	a257dd5c0110e9e4b4f3105999938600a892cc05
+++ aca319/lab6.txt	a257dd5c0110e9e4b4f3105999938600a892cc05
@@ -0,0 +1,114 @@
+
+ACA 319
+Laboratory Report: Lab 6.
+
+Author: Grahame Bowland
+Student Number: 9902827
+
+1. Compare both the run-time (speedup) and result (correctness) with
+the sequential version and the parallel version.
+
+The sequential version of my program is attached as "lab6q1.c". The
+parallel version is attached as "lab6q2.c". Run times are as follows:
+
+Program		Nodes	  Runtime
+-----------*------------*---------------
+lab6q1.c   |  		| 8.59s
+lab6q2.c   |	3	| 10.00s
+lab6q2.c   |	4	| 10.57s
+lab6q2.c   |	5	| 10.05s
+lab6q2.c   |	6	| 10.30s
+lab6q2.c   |	7	| 10.50s
+
+The output of both programs appears to be the same. The hot spot can
+be seen to "warm" neighbouring nodes, while the ice has an overwhelming
+effect to cool nodes not neighbouring on the hot spot.
+
+The distributed version of the program operates by dividing the task
+up into rows. Each slave is assigned a particular group of rows to
+calculate. Each time the graph is iterated, the following occur;
+  - each slave receives the rows that neighbour on it (above and
+    below) from the master process. These rows represent a data-
+    dependency, as the algorithm for each point in the edge rows
+    necessarily includes the rows above and below.
+  - each slave calculates the temperature gradient within its
+    assigned area
+  - each slaves sends its result back to the master
+  - the master then outputs a PGM image
+
+The program does not speed up as the number of nodes are increased. I
+believe the problem is that the problem is relatively simple to solve
+and thus the message passing overhead is excessive. Hence the distributed
+version of my program is slower than the single processor version.
+
+There are two clear reasons that the message passing overhead is
+so high:
+  - each slave is data dependent on its neighbours. This means that
+    each iteration must be calculated in lock-step.
+  - each slave must transmit its result back to the master at every
+    iteration. This is a significant transfer.
+
+In previous labs it was observed that MPI is a high latency communications
+system. The size of the transfer is less important than the number of
+transfers occuring. Unfortunately to calculate 1000 iterations on the
+order of 3000 messages are required.
+
+2. How do you think the temperature gradient, work load for the slaves
+will vary when;
+
+(a) using 4-connected neighbours
+
+This is the algorithm we implemented. The temperature from the hot
+spot spreads out over neighbouring tiles, with the ice cooling all
+tiles not near enough the hot-spot. The graph settles to an equilibrium.
+
+(b) using 8-connected neighbours
+
+In terms of data dependency in my distributed implementation, this
+change would not affect the amount of data that would need to be
+exchanged between slaves. There would be a small amount more integer
+maths (as eight nodes are being averaged rather than four) but this
+would be unlikely to significantly slow the program.
+
+The temperature gradient would settle to an equilibrium more
+readily, as the hot point would affect all eight of its neighbours
+directly and from there on.
+
+(c) scanning the image from bottom-up
+
+This would emphasise the effect of the ice - its cooling effect
+would flow upwards very quickly. The hot-spot would heat surrounding
+elements more slowly.
+
+This algorithm would be hard to implement in a distributed fashion
+as it is by definition iterative. Each row must be calculated in
+order. The individual cells in a row could possibly be split
+between slaves, however the message passing overhead would be
+excessive.
+
+(d) radiating outward from the hot-spot
+
+This is very hard to implement in a distributed fashion using double
+buffering. Iterating outward from the hot-spot creates a large number
+of data dependencies, and makes it very difficult to split the task
+up into large areas of work which can be distributed over slave
+processes. This would cause the slaves to spend a lot of time message
+passing, and little time doing work.
+
+The effect of this algorithm on the graph would be to emphasise the
+hot-spot. The ice would have minimal effect as its values would be
+the last calculated.
+
+(e) scanning the entire image first, determining the next values, and only
+    then update the pixels
+
+This is the same as my implementation using 4-connected neighbours.
+
+3. Attach event logging for experiment 2 with 'upshot' graphs with your
+report.
+
+The upshot graph is attached. The lines for all slave nodes are very
+faint as they are calculating 1000 intervals. It can be seen from the
+graph that the program is operating in parallel as designed.
+
+