Csc-387 - Project 4

Gaussian Elimination


For this project you will read a matrix from a file, perform gaussian elimination, and find the determinant.

To do this, we need to discuss

  1. how to store the matrix,
  2. how to perform gaussian elimination, and
  3. how to get the determinant after elimination.

I want to leave you some flexibility in deciding how to write your programs, so I'll split this into two pieces: requirements, and suggestions. Beyond the requirements you can make your design decisions however you like, but if you don't know where to start, the suggestions are there so you won't be left in the dark.

For the report, include the following:

  1. (5 pts) Electronic copy of your program. Your program must: (i) contain a program overview/summary and your student-id in the header, (ii) execute correctly, (iii) be well documented (Must include comments before every major statement, function/subroutine calls, and every MPI call explaining what the call is for. Must include specs for every function/subroutine)
  2. (5 pts)
    Read the file ``How to submit projects?'' on the class website and follow all the instructions in there.
    You must store the requested files in your subdirectories.
  3. (10 pts) Explain how the data and task partitioning is done.
  4. (25 pts) Describe in detail how your program works. And report the value of the determinant for each size matrix (100, 200, 400, 800, and 1600). (For your checking, the correct results are stored at http://web.mst.edu/~ercal/387/matrices/matrices-det)
  5. (30 pts) For each size matrix (100, 200, 400, 800, and 1600) report the running time vs. number of processors (1, 2, 4, 8, and 16.. actually there's nothing about this program that makes the number of processors need to be a power of two). Also report the speedup as a function of the number of processors for each matrix size. Note: for timing purposes, don't time the process of reading the data from the file and distributing it among the processors.. only time the computations for the Gaussian Elimination and calculating the determinant using something like (barrier; start timer; compute; barrier; stop timer).
  6. (25 pts) Explain the behavior of the plots linking it to the load balancing, task granularity and the communication overhead. You do not need to give specific numbers. Just mention the relationship between some of the following parameters: N, P, speedup, load balancing, task granularity, and the communication overhead. Use phrases like ``... it increases/decreases proportional to ...'' , ``...gets larger/smaller ...'', ``.. the increase in X is more than the increase in Y for such and such reason ...'', etc.

Bonus: As with the other programs, the person with the best time (on 16 processors with matrix size 1600x1600) will get bonus points as described in the syllabus.

Also although this assignment is reasonably self-contained, feel free to look for ideas in other resources like M.Quinn's Textbook which is on Reserve in the MST Library.


Requirements

  1. You are required to use double precision arithmetic.
  2. How to store the matrix:
o     
o               -1   2   3
o                1   1  -0.5
o                2   2   1
         

would be stored as

 
            3
            -1
            1
            2
            2
            1
            2
            3
            -0.5
            1
         

If you would prefer a different format, you can write a program to convert it yourself.

The sizes of the files are 100x100, 200x200, 400x400, 800x800, 1600x1600. The names of the files are matrix.100, matrix.200, etc.

The files are stored at http://web.mst.edu/~ercal/387/matrices/

In a "real" application, this program would be part of a larger program where the data would be generated by a previous part, and preferably not read in from a file.

  1. How to perform gaussian elimination:
    You must allow some manner of pivoting.. at least to avoid division by zero. Aside from that it's up to you.
  2. How to get the determinant after elimination:
    The final result (one number representing the determinant) should end up stored on processor 0. At that point you can stop the timer, then print the result.

Suggestions / Explanations

  1. How to store the matrix:
    I would store it column-wise in a striped fashion. Thus if we were to draw the matrix with each column labeled with the processor who owns it we would have in the case of 4 processors
2.   
3.        | | | | | | | | | |  |
4.        | | | | | | | | | |  |
5.        |P|P|P|P|P|P|P|P|P|..|
6.        |0|1|2|3|0|1|2|3|0|..|
7.        | | | | | | | | | |..|
8.        | | | | | | | | | |..|
9.   
10.      so processor zero  owns columns 0,4,8,12,16,..
11.         processor one   owns columns 1,5,9,13,17,..
12.         processor two   owns columns 2,6,10,14,18,..
13.         processor three owns columns 3,7,11,15,19,..
    

For simplicity, you may want to ensure that the matrix dimension divides evenly among the processors. The easiest way to do this is to add a few extra columns and rows to the matrix, and naturally you have to do this in such a way that it doesn't change the determinant. This is accomplished by putting 1 along the diagonal of the new rows and columns, and putting 0 elsewhere. For example

 
      If we have the matrix
        -1    2    3    1
         1    1   -0.5  0
         2    2    1   -2
         3   -2    1   -1
      and we have three processors, then life would be much easier if we
      made this 6x6 instead of 4x4.  An equivalent 6x6 would be
        -1    2    3    1    0    0
         1    1   -0.5  0    0    0
         2    2    1   -2    0    0
         3   -2    1   -1    0    0
         0    0    0    0    1    0
         0    0    0    0    0    1
    
  1. How to perform gaussian elimination:
    I am assuming some familiarity with the basic algorithm: Get zeros below the diagonal down the first column, then the second and so on. In pseudocode, we can write
15. 
16.       Algorithm for Gaussian elimination (with pivoting):
17.       
18.       Start with all the numbers stored in our NxN matrix A.
19.       
20.       For each column p, we do the following (p=1..N)
21.         First make sure that a(p,p) is non-zero and preferably large:
22.         Look at the rows in our matrix below row p.  Look at the p'th
23.         term in each row.  Select the row that has the largest absolute
24.         value in the p'th term, and swap the p'th row with that one.
25.         (optionally, you can only bother to do the above step if
26.         a(p,p) is zero).
27.       
28.         If we were fortunate enough to get a non-zero value for a(p,p),
29.         then proceed with the following for loop:
30.         For each row r below p, we do the following (r=p+1..N)
31.           row(r) = row(r)  -  (a(r,p) / a(p,p)) * row(p)
32.         End For
33.       End For
    

Note that the pivoting to get the largest element possible on the diagonal reduces the amount of error propagated. Also note that we'll consider the effect this has on the determinant in the next section.

Now lets think about this in parallel using the striped column-wise storage we talked about earlier. Every processor has to perform the same row operations. Thus for each iteration through the main loop, they all have to know both a(p,p) and a(r,p) for every r>p in column p. Thus instead of doing the broadcasts one at a time, the most sensible method would be to broadcast an entire column at a time (or at least everything at/below the diagonal). That gives all the processors enough information to do all the row operations they need.

  1. How to get the determinant after elimination:
    Once we have used gaussian elimination to form an upper triangular matrix U, we obtain the determinant of U by multiplying the elements along the diagonal. So of course the next question is how the determinant of U is related to the determinant of the original matrix A. And fortunately it is very closely related (if we followed the gaussian elimination algorithm as specified above). If we did an even number of row interchanges in the gaussian elimination algorithm, then det(A) = det(U), and if we did an odd number of them, then det(A) = -det(U).