The unified diff between revisions [6e36f478..] and [4611588a..] is displayed below. It can also be downloaded as a raw diff.
# # # add_file "aca319/lab7.txt" # content [21ee81ca17056a7766e99f6187ca63d569380a09] # ============================================================ --- aca319/lab7.txt 21ee81ca17056a7766e99f6187ca63d569380a09 +++ aca319/lab7.txt 21ee81ca17056a7766e99f6187ca63d569380a09 @@ -0,0 +1,117 @@ + +ACA 319 +Laboratory Report: Lab 6. + +Author: Grahame Bowland +Student Number: 9902827 + +1. Discussion of algorithms implemented. + +(a) lab7q1.c; bucket sort. + +This program implements a distributed bucket sort. In order to make the +use of MPI's array splitting utility functions easier (and less prone to +error) this program will only sort a list of numbers that splits evenly +over all processors. + +Firstly, processor zero reads in a list of positive integers from an input +file. Then the function MPI_Scatter is called to send each processor an +equal part of the array. + +Each processor sorts its data into buckets. The range of numbers a given +bucket should contain is determined by dividing the range of input values +evenly by the number of processors. + +It should be noted that this method of dividing input data to buckets will +only allow for an even distribution of work in the case where the input +data is evenly distributed over the range. In any other circumstance, +certain buckets will contain more numbers than others. This will result +in some processors sorting more numbers than others. There are clearly +worst-case situations; for example when only a few values are in a low +range and the vast majority of values end up in a single processor's +bucket. + +Each processor calls MPI_Alltoall. This sends the first bucket on every +node to node one, the second bucket on every node to node two, and so on. +After the call completes each processor has all numbers from the original +data set that are contained in a given bucket. + +Each processor then sorts this data using qsort(). After completion of +this function call the processor will be left with sorted data in a distinct +range of the input values. + +MPI_Gather is then called on all nodes. This sends the sort results from each +processor node to the processor zero. Processor zero thus obtains an +array containing the ordered results from each bucket. Processor zero then +outputs the array, which displays the sorted input data. + +(b) lab7q2.c; bucket sort with sampling + +Lab7q2.c is a modified version of the program described above. I will only +discuss my changes to the original program. + +Immediately after MPI_Scatter has been called, each processor "samples" +the input data it has been allocated. Ideally (n-1) samples are taken +where n is the number of processors. If n-1 samples are not available +due to insufficient input data, the processor takes as many as it can. + +Each processor copies its samples symmetrically through an array. This +allows a call to MPI_Alltoall to produce an array on each processor +containing all samples that were taken on all nodes. Each of the +processors then samples this array. + +At this point each processor has a sorted array of samples from within +the data set. Each processor then uses the same algorithm to sample +this array again. As the data is the same on all processors, the +same values are determined. These values (n-1 of them) are used as +bounds when determining which bucket in which to place a given +input value. + +The sampled input values should be more representative of the distribution +of input values. This provides for a more uniform distribution of work +amongst processors. However, the additional overhead from sampling means +that this implementation will be slower if the input data happened to +be uniformally distributed - in such a case, sampling achieves nothing +to improve work distribution. + +2. Results + +I generated three input files. Each file contains ten thousand integers +in a range from [1,500]. The files had the following characteristics; + File 1: random numbers in the range + File 2: all values are '1' + File 3: random values in the range [450,500] + +The following table shows run times for lab7q1, lab7q2, and UNIX sort +for each of these input files. Each of the distributed programs was +run with five procesors. + + | lab7q1.c | lab7q2.c | UNIX sort +---------*-------------*--------------*-------------- +File 1 | 0.59s | 0.64s | 0.02s +File 2 | 0.59s | 0.66s | 0.01s +File 3 | 0.62s | 0.60s | 0.01s + +Due to limits in the size of the stack and disk quota on my account, I +was not able to test with a larger number of inputs. + +It can be seen that for all files, UNIX sort was much faster than +the MPI versions. This is due to the high startup time of MPI over +SSH. Hence, this discussion will focus on comparing the results for +lab7q1.c and lab7q2.c. + +lab7q1.c was faster for both files 1 and 2. This makes sense, as both +of these files are reasonably evenly distributed. Lab7q2.c is slower +in these cases, as the sampling does not significantly affect the +distribution of work between processors. + +However, for file 3 it can be seen that the sample sort was faster. +In this situation a naive bucket sort results in one or two processors +sorting very large buckets, and the other processors doing very little. +In contrast, the sample sort determines the input is heavily weighted +towards the range [450,500] and distributes the work load accordingly. + +For all test cases above the output was verified to be the same and +correct for all programs. Each program produces ten thousand ordered +numbers which matched the input set. +