CSc-387 Parallel Processing CHAPTER 5: PIPELINED COMPUTATIONS - pipeline stages, functional decomposition: (* T Figure 5.1 *) - Maximum speedup that can be achieved <= # of pipeline stages - Example: A frequency filter to remove specific frequencies from a digitized image (* T Figure 5.3 *) - Given that a problem can be divided into a series of sequential tasks, the pipelined approach exhibit good performance ... TYPE 1) If many instances of the same problem is to be executed TYPE 2) If many data items are processed each requiring multiple operations TYPE 3) If information to start the next process can be passed forward before the process has completed all its internal operations Space-Time Diagram for TYPE 1: (* T Figure 5.4 *) (* T Figure 5.5 *) Space-Time Diagram for TYPE 2: (* T Figure 5.6 *) Space-Time Diagram for TYPE 3: (* T Figure 5.7 *) If the number of stages is larger than the number of processors in a pipeline, a group of stages can be assigned to each processor: (* T Figure 5.8 *) ------------------------------------------------- 5.3. PIPELINE EXAMPLES ------------------------------------------------- Example-1. Adding numbers (* T Figure 5.10 *) --------- It makes sense to use this approach if many instances are executed. (e.g. m sets of n numbers: n processors each having an array of m numbers and numbers at each array index will be summed up seperately) In this case, this problem becomes TYPE-1 and: m Tseq = O(n*m) Tpar = O(n+m) Speedup = nm/(n+m) = -------- 1 + m/n Example-2. Sorting Numbers (* T Figure 5.13 *) (* T Figure 5.14 *) --------- TYPE-2: a series of operations are performed on a series of data items. If we implement this algorithm sequentially, then: Tseq = 1 + 2 + 3 + 4 + 5 + ... + (n-1) = n(n-1)/2 Tpar = O(n) Example-3: Solving a System of Linear Equations (Upper Triangular) ---------- a(n-1,0)x0 + a(n-1,1)x1 + a(n-1,2)x2 + ... +a(n-1,n-1)x_n-1 = b_n-1 a(n-2,0)x0 + a(n-2,1)x1 + ... +a(n-1,n-2)x_n-2 = b_n-1 ....... ...... ..... a(2,0)x0 + a(2,1)x1 + a(2,2)x2 = b2 a(1,0)x0 + a(1,1)x1 = b1 a(0,0)x0 = b0 SOLVE in sequence: (* T Figure 5.18 *) x0 = b0/a(0,0) x1 = [b1-a(0,0)x0] / a(1,1) .... xi = f(x0,x1,...,x(i-1),bi) Sequential Code: ---------------- x[0] = b[0]/a[0][0]; for(i=1; i < n; i++) { sum = 0; for(j=0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; } Tseq = 1 + 2 + 3 + ... + (n-1) = n(n-1)/2 Parallel Code for Process Pi (1. version): ------------------------------------------- for(j=0; j < i; j++) { recv(&x[j], P_(i-1)); send(&x[j], P_(i+1)); } sum = 0; for(j=0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; send(&x[i], P_(i+1)); ........ with a SIMPLE CHANGE this code becomes TYPE-3 pipeline with a lot better performance ....... Parallel Code for Process Pi (2. version): (* T Figure 5.19 *) ------------------------------------------- (* T Figure 5.20 *) sum = 0; for(j=0; j < i; j++) { recv(&x[j], P_(i-1)); send(&x[j], P_(i+1)); sum = sum + a[i][j]*x[j]; } x[i] = (b[i] - sum)/a[i][i]; send(&x[i], P_(i+1)); Tpar = T(recv x0) + T(recv x1, ...x(n-2)) + (Compute x(n-1) = n + (n-2) + 2n = O(n)