CSc-387 Parallel Processing CHAPTER 12 : SEARCHING AND OPTIMIZATION ======================================= - Combinatorial search and optimization techniques look for the best solution among many potential solutions. - Typically, all these problems are characterized by a huge number of permutations. Therefore exhaustive search is infeasible. In most cases, there are no known polynomial time algorithms to find the exact solution for the problem. They are in NP-COMPLETE class. - In almost all cases, some form of directed search technique is used to find suboptimal solutions. Sample problems: TRAVELING SALESMAN PROBLEM (TSP): find the shortest route that starts at one city and visits every other city in the list exactly once before returning to the first city. 0/1 KNAPSACK PROBLEM: objects are assigned weights and profit values. The goal is to pack a knapsack (limited capacity) with selected objects to maximize profit. N-QUEEN's PROBLEM: 15- and 8-PUZZLES: Popular Search and Optimization Techniques: ------------------------------------------- - Branch-and-bound Search - Dynamic Programming - Hill Climbing (Greedy Algorithms) - Simulated Annealing - Genetic Algorithms 12.2. BRANCH-AND-BOUND SEARCH ============================= State Space tree: (* T Figure 12.1 *) - The tree could be explored using DEPTH-FIRST or BREADTH-FIRST searh methods. Exhaustive search would be prohibitively expensive. Must find ways to reduce the search. - A BOUNDING FUNCTION which provides a LOWER BOUND (for minimization problems) or an UPPER BOUND (for maximization problems) must be devised. - A common strategy: BEST-FIRST search. - Avoid unpromising paths: PRUNING another technique: BACKTRACKING LIVE NODE: a node is reached but not all its children have been explored E- NODE: its children are currently been explored DEAD NODE: all its children have been explored PARALLEL BRANCH AND BOUND: --------------------------- - Allocate a portion of the state space tree from a node downward to one PE. =====> Parallel Depth-First search ISSUES: How will each PE select the next E-node? Load Balancing is difficult (dynamically expanding tree) Speedup Anomalies: Superlinear speedup is possible (how?) 12.3. GENETIC ALGORITHMS ======================== - mimic the process of natural evolution in biology. - CROSSOVER: combining chromosomes from parents to produce offspring is the predominant mechanism by which chromosome patterns change. - In addition, "Survival of the fittest" rule and "Mutation" (an occasional random change in the chromosome pattern) are also part of the natural evolution process. - GENETIC ALGORITHMS are computational approaches to problem solving that are modeled after the natural evolution process. --> Create an initial population (individuals) --> Evaluate their FITNESS (sort wrt "most fit" ---- "least fit") --> Select a subset of the population (randomly but favoring the "more fit" individuals) for the production of offspring (CROSSOVER operation). --> Repeat above steps until a satisfactory solution is reached. (* Slide from text p.374 *) ISSUES: ------- - Representation of Chromosomes (that is suitable for crossover) - How to produce the initial population? - Fitness function - Selection of parents - Crossover function - Termination condition SAMPLE PROBLEM: Determine x, y, z values that will maximize: --------------- f(x,y,z) = -x^2 + 1,000,000x - y^2 - 40,000y - z^2 over an integer domain of -1,000,000 to +1,000,000 Closed form solution obtain through factorization yields the following results: x= 500,000 y= -20,000 z=0 However, it is not always easy to find a closed form solution. Size of the search space for an EXHAUSTIVE SEARCH: (2,000,000)^3 = 8*10^18 Even if the evaluation of the function takes 100 ns, an exhaustive search would take 8*10^11 seconds > 2*10^8 hours on a single processor. Even if we were able to speed up the process by a factor of 10,000 through parallel implementation, it would still take 2*10^4 > 2 years. Let's design a Genetic Algorithm to solve it. DATA REPRESENTATION: -------------------- - Use binary strings to represent x, y, and z. - Since 2^20 < 2,000,000 < 2^21 we need at least 21 bits per variable. - Chromosome i : concatenation of 3 bit strings that belong to x_i, y_i, z_i - suppose x = +262,408 = 001000000000100001000 y = + 16,544 = 000000100000010100000 z = - 1,032 = 100000000010000001000 Chromosome: 001000000000100001000000000100000010100000100000000010000001000 - Each Chromosome is 63 bits. - Use a pseudorandom number generator to create the initial population. FITNESS EVALUATION: ------------------- - For each (x,y,z) compute f(x,y,z) - rank the results based on the value of f(x,y,z) if f(x1,y1,z1) > f(x2,y2,z2) then (x1,y1,z1) is "more fit" than (x2,y2,z2) - Constraints: obviously, x, y, and z must be within their domains ----------- if not, they are considered to be fit at all. One solution to this problem: scale the original values such that they fall within their domains. NUMBER OF INDIVIDUALS: ---------------------- How many individuals should be in the initial population? - If too many: takes a lot of time to compute If too few: hard to get quality solutions Depending on the problem, pick 20-1000 individuals (what a range! (:-)) SELECTION OF PARENTS FOR CROSSOVER ---------------------------------- - Selective pressure: bias toward the more fit individuals ------------------ - too much selective pressure : converges quickly to a local optima. - too little selective pressure : slow convergence Tournament Selection: Pick k individuals (typically k=2) randomly and --------------------- Pick the most fit individual. Repeat n times. OFFSPRING PRODUCTION THROUGH CROOSOVER -------------------------------------- Single-point Crossover: Pick a random position in two strings and swap the substrings as shown in (* T Figure 12.2 *) Due to the selection process having bias toward choosing the more fit individuals, as in nature, crossover slowly tends to force the population toward stronger (i.e. more fit) individuals Multipoint Crossover: several cuts are made and smaller portions are intermingled. Uniform Crossover: each bit of the child is randomly selected from one of the parents. MUTATION: causes of mutation in nature: disease, radiation, --------- ingestion of various controlled substances. Implementation: flip the bit value at a randomly picked position. - Depending on the bit position selected, the effect can be very powerful. - Rate of mutation: generally kept low. Otherwise, slow convergence. VARIATIONS: ----------- - carry over a few of the most fit individuals to the next generation - randomly create a few new individuals at each generation - change the population size at each generation TERMINATION CONDITIONS: ----------------------- - Simple approach: stop after S generations obvious deficiencies: 1) what if it converged early on? 2) what if the solution is not good enough? - Adopt a similar approach as in the numerical algorithms: stop when there is no improvement in the solution - A third approach: stop iterations when the Degree of Similarity among the individuals is greater than a threshold. Rationale: degree of similarity among the individuals increases as the population nears to an optimum solution This approach requires a measure for the Degree of Similarity. ============================ PARALLEL GENETIC ALGORITHMS ============================ Two approaches: 1) Let each PE operate independently on an isolated subpopulation, periodically sharing its "best" individuals with others (migration) 2) Let each PE do a portion of each step of the algorithm - selection, crossover, and mutation - on the common population. MIGRATION OPERATOR ------------------ A nature example: rabbits introduced into Australia Related tasks: selection of the emigrants, sending and receiving the emigrants, and integrating them. ** Note that your intuition and creativity plays an important role in implementing these tasks. *** - one popular approach to Selection and Integration of emigrants: After every k steps, migrate the best individuals from your own population, replace your worst ones with the ones received from other populations. Selective pressure must be applied in a controlled way. Two migration Models: 1) The Island Model: no restrictions on where an individual may migrate (* T Figure 12.3 *) 2) The Stepping Stone Model: individuals may only migrate to neighbors: (* T Figure 12.4 *) A theory from nature: new species are likely to form quickly in subpopulations following a change in the environment. - Since migration introduces new genetic material into the population, it is believed that new solutions could develop due to migration. - Communication overhead because of migration after every k steps, prevents to achieve a P-fold speedup. LIMITED MIGRATION, as in the case of "The Stepping Stone Model" has the advantage of introducing small communication overhead. However, it may result in subpopulations converging to unique local optimum solutions. - This isolation/local_optima phenomenon in genetic algorithms is believed to have occurred in nature as well! Australian continent has species unique to that region: wingless birds, and large hopping mammals. - Obviously, to reduce the effective isolation, we need to increase the interprocessor communication which in turn limits the speedup. PARALLELIZING A COMMON POPULATION: ----------------------------------- this is the other approach to parallelization. Processors work together to implement each pass through the common loop; selecting the individuals, applying crossover/mutation, evaluating the fitness etc. - The efficiency of this approach depends upon the relative time to compute various operations (selection, crossover, mutation) compared to the time to distribute the population information to all PEs plus the time to collect the results back. In most cases, it may not pay off. However, in a shared memory system, the communication overhead will dissappear in most part, and this approach could be viable. =========================== 12.4. SUCCESSIVE REFINEMENT =========================== In the problem above, represent (x, y, z) in a 3-D grid of points. Range of each variable: -1,000,000 to 1,000,000 . First select a coarse grid and evaluate the function for each grid point each time incrementing the values by 10,000. . There will be (200 x 200 x 200) = 8*10^6 evaluations in this round. . Select the K best individuals (points) from this round. . A cube centered on each of the K points is evaluated using a finer grid (e.g. an increment of size 100). Need K * (10,000/100)^3 = K*10^6 additional evaluations. . Again retain K best points from each of the K subvolumes and continue evaluations using an increment of 1. Need a total of (K^2 * 10^6) evaluations. - This approach parallelizes readily. Use K processors and let each PE evaluate a different subvolume each time. At the end of each round, each PE selects its K best points and communicates with others to come up with K best points overall. =================== 12.5. HILL CLIMBING =================== In a combinatorial optimization problem, if we accept only those configurations which improve the solution (Hill climbing or greedy approach), then we may easily get trapped in a local optima. If we want to find the global optimum solution, we should once in a while visit those configurations which worsen the solution. However, there is another way of improving the chances of finding the global optimum while using the "Hill Climbing" approach: Start from N randomly selected locations (configurations), and use the hill climbing technique to find an optimum configuration. As we increase the number of starting configurations, the probability that the global optimum solution is reached will become higher.