CSc-487 Advanced Parallel Computation ============================ 2.15 SORTING (Sahni p.65) ============================ (** For simplicity, we will always consider "sorting" in "nondecreasing" order throughout this chapter **) Bitonic sequence : nonincreasing ..... nondecreasing ex: 24, 20, 9, 2, 1, 8, 10, 11, 12, 13, 30 Sorting a bitonic sequence into nondecreasing order : ----------------------------------------------------- STEP 1: Sort ODD subsequence (24, 9, 1, 10, 12, 30) RECURSIVELY. STEP 2: Sort EVEN subsequence (20, 2, 8, 11, 13) RECURSIVELY. RESULT1 : 1 9 10 12 24 30 RESULT2 : 2 8 11 13 20 STEP 3: Compare/Exchange RESULT : 1 2 8-9 10 11 12 13 20-24 30 Proof of Correctness for Batcher's Bitonic Sort Algorithm ---------------------------------------------------------- In 2 steps: THEOREM 2.7 : [Knuth's 0/1 Principle] ============ If a sorting algorithm that performs only element comparisons and exchanges sorts all sequences of zeroes and ones then it sorts all sequences of arbitrary numbers. proving SORTS ALL 0/1 s ====> SORTS ANY SEQUENCE is = to proving CAN NOT SORT A SEQUENCE ====> CANNOT SORT ALL 0/1s PROOF: ------ Let f be a monotonic function: x <= y ======> f(x) <= f(y) Obviosuly, if a compare/exchange algorithm transforms (x1,x2,....,xn) into (y1, y2, ...,yn) it also transforms (f(x1),f(x2),...,f(xn)) into (f(y1),f(y2),...,f(yn)) Suppose that it sorts x to obtain a y sequence where y(i) > y(i+1) UNSORTED Define f as : f(x)=0 for x < y(i) and f(x)=1 for x >= y(i) Then the algorithm transforms the 0/1 sequence (f(x1),f(x2),...,f(xn)) into the 0/1 sequence (f(y1),f(y2),...f(yi),f(y[i+1])...,f(yn)) which is NOT SORTED. 0 .... 0 1 0 1 .. 1 Comments by Mark Allen on the conditions for a sorting algorithm to be called a "comparison-based sorting algorithm": ----------------------------------------------------------------- As far as the 0/1 proof, I think the conditions on a "comparison based sorting algorithm" would be the following (These conditions should be sufficient, and although I don't prove it I'm quite confident I can write a counter-example program if you try to weaken them much): 1.) The only two operations which can be used to modify the data in the array are a.) compare and exchange, and b.) swap (unconditionally). 2.) For the logical expression that direct the flow of control in our program, we can use any kind of logical expression, as long as this expression does not vary with the contents of the array. "Contents" meaning the value of A[0], A[1], .., and A[n-1]. So the number of elements in the array, n, is fair game to use in our logical expressions. The goal of these conditions would be that given an array and a non-decreasing function f, the algorithm performs all the exact same swaps, on elements of the array x1,x2,..,xn as it does on the array f(x1),f(x2),..,f(xn), or if it does not perform some swap on say f(xi) and f(xj) where it did on xi and xj, it is because f(xi)==f(xj). Another desirable result of these conditions is that (since the flow of control is unrelated to the contents of the array) two arrays of equal size get sorted in the same ammount of time. The first condition is obvious. It is what we have been saying the whole time. The only two operations which can be used to modify the data in the array are 1.) compare and exchange 2.) swap (unconditionally) By the way, the compare and exchange operation may test for <, <=, >, >=, or == between the two array elements in making its decision of whether or not to exchange those two elements. The second condition is less obvious and I've never heard it stated when people discuss comparison based sorting algorithms, but it is just as important. We must restrict what kinds of logical expressions can be used to alter the flow of control. [Eg. the logical expressions in "if" and "while" statements.] The condition is that we can make any kind of logical expression we want as long as neither expression varies with the contents of the array. Note: this condition implies in particular that we may NOT use array elements in comparisons that are related to the flow of control of the program (even if _both_ elements are array elements). The ONLY time such comparisons happen is inside a "compare exchange" command. To see the problem if we allow comparisons between elements of the array to be used in an if statement, here is an example of such a program that contradicts the 0/1 principal: Make a couple loops that go through comparing all possible pairs of array elements, and keeps track of whether any of them have equal values. If (we find _any_ two elements that are equal OR the array has only one or two elements) then we say "ah, I like this array, I'll sort it correctly." next we sort using some legitimate sorting algorithm else we say "nope, I think I'll just output garbage this time." endif End of algorithm. Note that the first condition of this "if" statement will always evaluate true when we do a sequence of 0's and 1's. So it will always sort these correctly, but it will not do all arbitrary sequences. The problem is that the condition in the "if" statement varies with the contents of the array. I just wanted to point this out because I consider it quite dangerous to oversimplify what it means for an algorithm to be comparison based. I don't mean to overcomplicate it either. If we wanted to do that we could write a grammar that generates all legitimate comparison based algorithms :) . It wouldn't be too informative though, I don't think. - Mark THEOREM 2.8 : bitonic sort program sorts all even-length bitonic 0/1 sequences bitonic 0/1 seq : 1^a 0^b 1^c (a+b+c) is even sorted odd subseq: 0^d 1^x Sorted Even Subseq: 0^e 1^y Since |d-e| <= 1, cases are : i) 0 0 ...... 0 0 1 1 .... 1 0 0 ...... 0 0 1 1 .... 1 ii) 0 0 ...... 0 0 1 1 .... 1 0 0 ...... 0 0 0 1 .... 1 iii) 0 0 ...... 0 0 0 1 .... 1 0 0 ...... 0 0 1 1 .... 1 after the compare/exchange step (3), we will have a sorted sequence. RESULT: Since bitonic sort is based on ONLY element compare-exchanges, it can sort any arbitrary bitonic sequence. When n = 2^k (power of 2), recursion unfolds nicely. Here is how: ........... USE EXAMPLE FROM FIGURE 2.14 .................. BITONIC: 7 6 4 0 1 2 3 5 Sort ODD1(bito) 7 4 1 3 Sort EVEN2 (biton) 6 0 2 5 Compare/Exchange TO SORT ODD1(biton): 7 4 1 3 Sort ODD11(bito)7 1 Sort EVEN12(bito) 4 3 Compare/exchange To Sort ODD11(bito) 7 1 Sort ODD111 7 Sort EVEN112 1 Compare/exchange ------------------------------------------------------------------------ At the tail of the recursion, comparisons are made at distances of P/2. After the recursion unfolds, comparisons are made at distances of P/4, P/8, etc. ANOTHER EXAMPLE: ---------------- Unsorted sequence : C N M F H A P D (sorted in windows of 1) (bitonic sequences of size 2) D=1 C N M F A H P D (sorted in windows of 2) (bitonic sequences of size 4) D=2 C F M N P H A D C F M N P H D A (sorted in windows of 4) (bitonic sequences of size 8) D=3 C F D A P H M N C A D F M H P N A C D F H M N P (sorted in windows of 8) TIME COMPLEXITY : ======================== BITONIC SORT ON A HYPERCUBE --------------------------- Bito-sort on windows of size 2 Bito-sort on windows of size 4 : noninc - nondec - noninc - .... Bito-sort on windows of size 8 : noninc - nondec - noninc - .... Bito-sort on windows of size 16: noninc - nondec - noninc - .... Bito-sort on windows of size 2 1 step Bito-sort on windows of size 4 : 2 steps Bito-sort on windows of size 8 : 3 steps Bito-sort on windows of size 16: 4 steps ........ Bito-sort on windows of size 2^k: k steps Total STEPS = 1 + 2 + 3 + 4 + .....+ k where k=logN = k(k+1) = (log N)^2 ---------------------------------------------- SORTING ON A MESH CONNECTED PARALLEL COMPUTER ---------------------------------------------- C.D. Thompson and H.T. Kung Comm.of the ACM, April 1977, Vol.20 #4 p.263-271 (*** Check "Parallel Algorithms" by Pranay Chaudhuri, page 98- ***) - A square mesh of N = n x n identical processors - SIMD type computation; only exchanges (routing) and comparisons Final Sorted Configurations are: (** T refer to Figures **) --------------------------------- (i) Row-major indexing (ii) Shuffled row-major indexing (iii) Snake-like row-major indexing A Lower Order ------------- No sorting algorithm can do it better than in O(n) time (consider two data items are initially out of place; in opposite corners) -------------------------------- Odd-Even Merge on a Linear Array -------------------------------- (** T refer to Figures **) - Batcher's Odd-Even merge of two sorted sequences 1 3 4 7 9 10 15 22 0 1 2 3 5 6 11 25 . . . . . . . . L1: Unshuffle: Odd-indexed elements to left, evens to right 1 4 9 15 0 2 5 11 3 7 10 22 1 3 6 25 ----------- ----------- ------------ ----------- sorted sorted sorted sorted L2: Merge Sort (recursively) the "odd sequences" and the "even seq." 0 1 2 4 5 9 11 15 1 3 3 6 7 10 22 25 L3: Shuffle 0 1--1 3--2 3--4 6--5 7--9 10--11 22--15 25 L4: Comparison Interchange 0 1 1 2 3 3 4 5 6 7 9 10 11 15 22 25 DONE ! L3: PERFECT SHUFFLE can be performed by using the triangular interchange pattern as follows (** T refer to Figures **) 0 1 2 4 5 9 11 15---1 3 3 6 7 10 22 25 0 1 2 4 5 9 11---1 15--3 3 6 7 10 22 25 0 1 2 4 5 9---1 11---3 15--3 6 7 10 22 25 0 1 2 4 5--1 9---3 11---3 15--6 7 10 22 25 0 1 2 4--1 5---3 9---3 11---6 15--7 10 22 25 0 1 2--1 4--3 5---3 9---6 11--7 15--10 22 25 0 1--1 2--3 4---3 5---6 9--7 11--10 15--22 25 0 1 1 3 2 3 4 6 5 7 9 10 11 22 15 25 (L1: unshuffle is just the opposite): Complexity for L3 and L1 (Shuffle/Unshuffle) = (P/2 - 1) Q: What is the order of complexity for the whole sort process ? T(n) = n + T(n/2) = n + n/2 + n/4 + ....+ 2 + 1 = 2n-1 = O(n) To obtain the first 2 sorted sequences: Init(n) = T(1) + T(2) + T(4) + ....+ T(n) = 1 + 2 + 4 + ... + n = 2n - 1 = O(n) Q: Can we do better with 2-dimensional approach ? ------------------------------------ Hybrid Parallel Merge-Sort Algorithm ------------------------------------ 1<--7 1,7 * 1 2 1 2 | 2-->8 * 2,8 4---6 6 4 J1: J2: J3: | J4: 4<--10 ODDS 4,10 * ODD-EVEN 7 8 inter- 7 8 compare to LEFT Transpos. change | exchange 6-->12 * 6,12 Sort on 10--12 on 12 10 (every =========> the Col.s even | "even" 18 14 EVENs 18,14 * 14 15 rows 14 15 with the to RIGHT | next 22 15 * 22,15 17--22 22 17 "odd") | 25 17 J1: O(1) 25,17 * J2: O(n/2) 18 27 J3:O(1) 18 27 J4: O(1) | 27 32 * 27,32 25--32 32 25 1 2 6 4 7 8 12 10 SORTED: (snake-like) 14 15 18 17 | 22 25 | 32 27 OVERALL COMPLEXITY for T(n,2) = O(n) ----------------------------------------------- SORTING ON AN (n x k) MESH USING TWO-WAY-MERGE ----------------------------------------------- TWO-WAY-MERGE ALGORITHM (RECURSIVE): Initially, consider two sorted lists each of size nk/2 sorted in snake-like row-major order, one in the left half and the other in the right. 2 4 0 1 2 4---0 1 M1: M2: 8---5 6---3 Single interchange 5 8---3 6 Unshuffle step on even rows each row 10 13 7 9 O(1) 10 13---7 9 O(k) 15--14 12--11 14 15--11 12 2---0 4---1 0 2---1 4 | | 5---3 8---6 M3: 5 3---8 6 M4: | | Merge by calling Shuffle 10---7 13---9 M(n,k/2) 7 10---9 12 each row | | on each half O(k) 14--11 15--12 T(n,k/2) 14 11--15 13 0 1 2 4 0 1---2 4 | 5---8 3---6 M5: 8 5---6 3 M6: Interchange on | Compare-exchange 7 9 10 12 even rows 7 9--10 12 every "even" with O(1) | the "next "odd" 14--15 11--13 15 14--13 11 O(1) 0 1 2 3 7 6 5 4 Overall Complexity = M1 + M2 + M3 + M4 + M5 + M6 8 9 10 11 T(n,k) = O(k) + T(n,k/2) 15 14 13 12 Time Complexity for 2-WAY Merge Algorithm: T(n,k) = k + T(n, k/2) = k + k/2 + k/4 + k/8 + .......+ 4 + T(n,2) = k + k/2 + k/4 + k/8 + .......+ 4 + O(n) = O(n+k) TO SORT AN ARBITRARY SEQUENCE: First sort each column in parallel by using odd-even transposition sort (O(n) time). Then merge in parallel pairs of sorted columns first using HYBRID_MERGE then using TWO_WAY MERGE. OVERALL TIME FOR SORTING = Initial-Sort + T(n,2) + T(n,4) + ... + T(n,n) = n + n + ...... + n Overall Time-Complexity = O(nlogn) QUESTION: Can we reduce the time complexity by moving data in both Horizontal and Vertical directions? (* YES! O(n) *) ** To be covered if time permits ** ------------------------------------- BITONIC MERGE ON A 2-DIMENSIONAL MESH ------------------------------------- Figure 12 on p.270 ------------------ Initial Data Configuration AND STAGE-1: Merge pairs of adjacent 1x1 matrices by the comparison-interchange indicated 10 -> 9 14 <- 2 4 <-15 11 ->12 6 -> 1 5 <-13 8 <- 3 7 -> 0 STAGE-2: Merge pairs of 1x2 matrices; note that one member of a pair is sorted in ascending order, the other in descending order. This is always the case for bitonic merge. 9 10 14 2 9 -> 4 14 <-12 |D |D |U |U 15 4 11 12 15 ->10 11 <- 2 1 6 13 5 1 -> 3 13 <- 7 |D |D |U |U 8 3 0 7 8 -> 6 0 <- 5 STAGE-3: Merge pairs of 2x2 matrices. (Result is sorted 2x4 matrices: ascending-descending order) _________ | ____|____ | | v v 4 9 14 12 4 9 14 12 4 -> 2 11 ->12 _________ | ____|____ |D |D |D |D | | v v 10 15 11 2 10 2 11 15 10 -> 9 14 ->15 1 3 13 7 13 7 1 3 13 <- 8 5 <- 3 ^ ^____|____| |_________| |U |U |U |U 6 8 5 0 6 8 5 0 6 <- 7 1 <- 0 ^ ^____|____| |_________| STAGE-4: Merge the two 2x4 matrices _________ | ____|____ | | v v |----2 4 11 12 2 4 5 3 | | |--9 10 14 15 7 6 1 0 | | --->13 8 5 3 13 8 11 12 | -->7 6 1 0 9 10 14 15 SORTED: Shuffled Row-Major 2 3 5 4 1 -> 0 5 -> 4 0 1 4 5 | | | | v v v v 1 0 7 6 2 -> 3 7 -> 6 2 3 6 7 11 8 13 12 9 -> 8 13 ->12 8 9 12 13 | | | | v v v v 9 10 14 15 11 ->10 14 ->15 10 11 14 15 COMPLEXITY ANALYSIS -------------------- - Let T(2^i) be the time to merge 2 bitonically sorted elements of size 2^(i-1). Then T(2^i) = 2^[i/2] + T(2^(i-1)) | | For the first For, merging routing step 2^(i-1) elements T(2^i) = 2^[i/2] + 2^[(i-1)/2] + 2^[(i-2)/2] + ... + T(2) + T(1) = 2*{1 + 2 + 2^2 + 2^3 + ......+ 2^(i/2)} = 2*{2^[i/2 +1] - 1} = 4*2^(1/2) - 2 ---------------- T(n^2) = O(n) - Let S(2^(2j)) be the time taken by the corresponding sorting algorithm S(1) = 0 S(2^(2j)) = S(2^(2j-1)) + T(2^(2j)) = T(2^(2j)) + T(2^(2j-1)) + S(2^(2j-2)) = T(2^(2j)) + T(2^(2j-1)) + T(2^(2j-2)) + S(2^(2j-3)) S(n^2) = 4[n + n/2 + n/4 + n/8 + .... + 2 + 1] = 4*[2n-1] ------------- S(n^2) = O(n) ------------- QUESTION : If you have a hypercube of N^2 processors, which algorithm would you prefer ?: - Bitonic sort specifically designed for the hypercube OR - Embed an nxn mesh onto a hypercube and use the algorithm above. Explain Why.