CSc-387 Parallel Processing CHAPTER 3 : EMBARRASSINGLY PARALLEL COMPUTATIONS ---------------------------------------------------------------------- ---------------------------------------------------------------------- - Each process requires different (or the same) data and produces results from its input without any need for results from others. Minimal data exchange occurs. (* T Figure 3.1 *) - Nearly Embarrassingly Parallel Computations: requires some data to be distributed and collected and combined in some way. A MASTER initiates identical slave processes and gives them the data, and then collects the results back. Both master and slaves can also be created statically. (* T Figure 3.2 *) ---------------------------------------------------- 3.2.1 Geometric Transformations of Images ---------------------------------------------------- Shifting: for every pixel (x,y) perform: x' = x + Dx and y' = y + Dy Scaling: for every pixel (x,y) perform: x' = x*S_x and y' = y*S_y Rotation: for every pixel (x,y) perform: x'= xCos(A)+ySin(A) y'= -xSin(A)+yCos(A) Clipping: If the lowest values of (x,y) in the area to be displayed are (xl,yl) and the highest values of (x,y) are (xh,yh) then xl <= x' <= xh and yl <= y' <= yh should be satisfied; otherwise, the point (x',y') is not displayed. TWO WAYS OF PARTITIONING THE IMAGE: Block partitioning vs. Stripe partitioning (* T Figure 3.3 *) Sequential Time complexity: O(n^2) for an n*n image Parallel Time complexity = Tcomm + Tcomp = (p + 4n^2)+ (n^2/p) = O(n^2) Bottleneck: Communication for distributing the pixels and collecting them. - Better suited for a shared memory machine. Then the communication overhead goes away ---------------------------------------------------- 3.2.2 Mandelbrot Set ---------------------------------------------------- Mandelbrot is a set of points in a complex plane that are quasi-stable (will increase and decrease, but not exceed some limit) when computed by iterating a function: z(k+1) = z(k)^2 + c where k is the iteration #, z is a complex number, and c is the position of the point in the complex plane. The initial value for z=0. The iterations are continued until the magnitude of z is greater than 2. A sequential Mandelbrot program (written for Xlib) can be downloaded from: http://www.coe.uncc.edu/~abw/parallel/par_prog/resources/ Mandelbrot is computationally intensive and therefore it is a widely used test in parallel computer systems. It is embarrassingly parallel, because each pixel can be computed without any information about the other pixels. STATIC ASSIGNMENT: A fixed area of the display is assigned to each process. DYNAMIC TASK ASSIGNMENT: work pool/processor farms (* T Figure 3.5 *) Dynamic Task Assignment may work better for Mandelbrot, because number of iterations needed for each pixel is different, and it is not known a priori. To reduce communication overhead, blocks of pixels (e.g. one stripe at a time) could be sent at a time instead of one pixel. ---------------------------------------------------- 3.2.3 MONTE CARLO METHODS ---------------------------------------------------- use of random selections in calculations that lead to the solution to numerical and physical problems. Example: Finding the area of a circle (* T Figure 3.7 *) (Area of circle)/(Area of square) = X Points within the square are chosen randomly and a score is kept of how many points happen to lie within the circle. Compute the ratio X and then multiply it with 4 (Area of the square) to find the area of the circle. The same method could also be used to compute any definite integral (area under a curve): I = Area = Integral(x1,x2) F(x)dx = (x2-x1)* lim 1/N * (SUM(i=1, N) F(x_r) where x_r are randomly generated values of x between x1 and x2. MAIN CONCERN: how to produce the random numbers in such a way that each computation uses a different random number and there is no correlation between them. PRODUCING RANDOM NUMBERS FOR PARALLEL MONTE CARLO METHOD -------------------------------------------------------- One approach: a seperate process produces and distributes the random numbers ------------ typically, master process can do that. But, this part of the code runs sequential and therefore it is inefficient. A typical Random Number generator (Linear Congruential Generator): x(i+1) = (ax(i) + c) MOD m where a,c, and m are constants chosen to create a random sequence Values for a "good" generator: a=16807, m=2^31-1 (a prime number) c=0 This one generates a repeating sequence of (2^31 - 2) different numbers. One disadvantage is that it is strictly sequential and relatively slow. A Parallel Formulation ---------------------- x(i+1) = (ax(i) + c) MOD m (1) x(i+k) = (Ax(i) + C) MOD m (2) where A = a^k mod m C = c(a^(k-1) + a^(k-2) + ... + a^1 + a^0) mod m and k is a selected "jump" constant. (* see below for derivation of A and C *) A and C need to be computed once and broadcast to processors. By selecting k=P (# of processors), first k numbers are generated using the sequential approach (1), and then each processor can generate its own random number independently by using the formula (2). (* T Figure 3.10 *) A Practical Parallel Random Number Generator --------------------------------------------- Fox et al. proposed the following formula in 1994 which naturally generates random numbers from distant previous numbers: x(i) = (x(i-63) + x(i-127)) MOD 2^31 ********************************************************************* A and C above can be derived by computing x(i+1)=f(x(i)) , x(i+2)=f(f(x(i)), ... x(i+k)=f(f(f(...f(x(i)..)))) and using the following properties: (1) (A+B) mod M = [(A mod M) + (B mod M)] mod M (2) [ X(A mod M) ] mod M = (X.A mod M) (3) X(A + B) mod M = (X.A + X.B) mod M = [(X.A mod M) + (X.B mod M)] mod M (3) [ X( (A+B) mod M) ] mod M = (X.A + X.B) mod M