The unified diff between revisions [6e36f478..] and [4611588a..] is displayed below. It can also be downloaded as a raw diff.

#
#
# add_file "aca319/lab7.txt"
#  content [21ee81ca17056a7766e99f6187ca63d569380a09]
#
============================================================
--- aca319/lab7.txt	21ee81ca17056a7766e99f6187ca63d569380a09
+++ aca319/lab7.txt	21ee81ca17056a7766e99f6187ca63d569380a09
@@ -0,0 +1,117 @@
+
+ACA 319
+Laboratory Report: Lab 6.
+
+Author: Grahame Bowland
+Student Number: 9902827
+
+1. Discussion of algorithms implemented.
+
+(a) lab7q1.c; bucket sort.
+
+This program implements a distributed bucket sort. In order to make the
+use of MPI's array splitting utility functions easier (and less prone to
+error) this program will only sort a list of numbers that splits evenly
+over all processors.
+
+Firstly, processor zero reads in a list of positive integers from an input
+file. Then the function MPI_Scatter is called to send each processor an
+equal part of the array.
+
+Each processor sorts its data into buckets. The range of numbers a given
+bucket should contain is determined by dividing the range of input values
+evenly by the number of processors.
+
+It should be noted that this method of dividing input data to buckets will
+only allow for an even distribution of work in the case where the input
+data is evenly distributed over the range. In any other circumstance,
+certain buckets will contain more numbers than others. This will result
+in some processors sorting more numbers than others. There are clearly
+worst-case situations; for example when only a few values are in a low
+range and the vast majority of values end up in a single processor's
+bucket.
+
+Each processor calls MPI_Alltoall. This sends the first bucket on every
+node to node one, the second bucket on every node to node two, and so on.
+After the call completes each processor has all numbers from the original
+data set that are contained in a given bucket.
+
+Each processor then sorts this data using qsort(). After completion of
+this function call the processor will be left with sorted data in a distinct
+range of the input values.
+
+MPI_Gather is then called on all nodes. This sends the sort results from each
+processor node to the processor zero. Processor zero thus obtains an
+array containing the ordered results from each bucket. Processor zero then
+outputs the array, which displays the sorted input data.
+
+(b) lab7q2.c; bucket sort with sampling
+
+Lab7q2.c is a modified version of the program described above. I will only
+discuss my changes to the original program.
+
+Immediately after MPI_Scatter has been called, each processor "samples"
+the input data it has been allocated. Ideally (n-1) samples are taken
+where n is the number of processors. If n-1 samples are not available
+due to insufficient input data, the processor takes as many as it can.
+
+Each processor copies its samples symmetrically through an array. This
+allows a call to MPI_Alltoall to produce an array on each processor
+containing all samples that were taken on all nodes. Each of the
+processors then samples this array.
+
+At this point each processor has a sorted array of samples from within
+the data set. Each processor then uses the same algorithm to sample
+this array again. As the data is the same on all processors, the
+same values are determined. These values (n-1 of them) are used as
+bounds when determining which bucket in which to place a given
+input value.
+
+The sampled input values should be more representative of the distribution
+of input values. This provides for a more uniform distribution of work
+amongst processors. However, the additional overhead from sampling means
+that this implementation will be slower if the input data happened to
+be uniformally distributed - in such a case, sampling achieves nothing
+to improve work distribution.
+
+2. Results
+
+I generated three input files. Each file contains ten thousand integers
+in a range from [1,500]. The files had the following characteristics;
+  File 1: random numbers in the range
+  File 2: all values are '1'
+  File 3: random values in the range [450,500]
+
+The following table shows run times for lab7q1, lab7q2, and UNIX sort
+for each of these input files. Each of the distributed programs was
+run with five procesors.
+
+         | lab7q1.c    | lab7q2.c     | UNIX sort
+---------*-------------*--------------*--------------
+File 1   | 0.59s       | 0.64s        | 0.02s
+File 2   | 0.59s       | 0.66s        | 0.01s
+File 3   | 0.62s       | 0.60s        | 0.01s
+
+Due to limits in the size of the stack and disk quota on my account, I
+was not able to test with a larger number of inputs.
+
+It can be seen that for all files, UNIX sort was much faster than
+the MPI versions. This is due to the high startup time of MPI over
+SSH. Hence, this discussion will focus on comparing the results for
+lab7q1.c and lab7q2.c.
+
+lab7q1.c was faster for both files 1 and 2. This makes sense, as both
+of these files are reasonably evenly distributed. Lab7q2.c is slower
+in these cases, as the sampling does not significantly affect the
+distribution of work between processors.
+
+However, for file 3 it can be seen that the sample sort was faster.
+In this situation a naive bucket sort results in one or two processors
+sorting very large buckets, and the other processors doing very little.
+In contrast, the sample sort determines the input is heavily weighted
+towards the range [450,500] and distributes the work load accordingly.
+
+For all test cases above the output was verified to be the same and
+correct for all programs. Each program produces ten thousand ordered
+numbers which matched the input set.
+