Day 4:

I. Last Time:
A. Finished History:
Technological evolution:
Vacuum Tubes -> Transistors -> Integrated Circuits
(SSI, MSI, LSI, VLSI)

B. Hardware Component view:
CPU - Central Processing Unit:
ALU - Arithmatic and Logic
Datapath
Registers
Control

Talking to the Computer:
        Input: Mouse, Keyboard 
         Output: Monitor, Printer, Blinky Lights 
         (I/O: Network, Modem, etc.) 
         Memory/Storage: 
           Random Access: "RAM" & "Cache" - Temporary "Volatile"
storage Fast/Memory Hierarchy 
           Non-Random Access (Sequential): Disks, HDs, CDs, Tapes 
                          - Non-Volatile Storage Slow/Mechanical 

   C. Software Hierarchy & Components 
Hardware -> OS -> User Apps

D. Apps used by programmers:
      Source File -> (Compiler) -> ASM File (ASM Lang) -> (Assembler) -> 
      Object File (Machine Lang) -> (Linker) -> Executable -> (Loader (OS)) -> 
      Running Program

II. New Stuff:
    A. All you want to know about assembly 
       Why know/understand ASM: 
         1. Can be smaller / Faster 
         2. Sometimes nesc. to work with hardware 
         3. Can sometimes better utilize Processors by using an 
            instruction which the compiler would not. 
         4. On Some Processors, a compiler isn't available yet. (New design) 
         5. Sometimes need to be able to read to debug a program 
         6. ""       "" to estimate run time of a program 
       The Downside: 
         1. Not very portable (and often "advantages" are lost if a 
            little portable) 
         2. Expansion: 1 inst in C is often 5+ insts in asm 
         3. Tedious/Complex.: Often only 8-20 variables to use. 
(In our case, bad variable names, shared variables)
         4. VERY difficult to read/understand 

    B. Misc:
         Linking is the process of "sewing" the various pieces of a 
         program together: 

            Profiler: "Profiles" execution of code. Gives actual "timing"
                       data of how long parts of the code take to execute 
            
            Loader: A component of the OS which actually "loads" a file to 
                    be run. (The file must be in the proper format for the 
                    loader to interpret) 
                    A loader is often responsible for verifying that the 
                    "dynamic" parts of a program are present and "linking" 
                    to them. 
                    (Ever seen the windows "Missing DLL" Window? 
                     That's because a program was dynamically linked 
                     and the loader failed to locate something that it 
                     needed)       

    C. What is Performance: Ex Fig 2.1 Pg 55: Which is best?
Plane Pass Rnge(mi) Spd Pasr Thruput (pass*speed)
Boeing 777 375 4630 610 228,750
Boeing 747 470 4150 610 286,700
Concorde 132 4000 1350  178,200
Dgls DC-8-50 146 8720 544 79,424

       1. Performance is a way to answer the question: "Which is best"
       2. But it also must take into account "Best for what"

    D. Best in terms of what:
       There are many things which computers may be measured against. 
       Really, the most important task is to first think about what the 
       machine(s) in question will be used for prior to attempting to 
       determine which is best. I.e. Best at WHAT?

    E. Common Measures of Computer Performance:
       1. Graphics/Drawing Time: How many X can be drawn per second?
                   (High Quality Games for example - Often this is 
                    largely a function of the video card and memory
                    rather than the CPU)
       2. Disk ACCESS: How long does it take to accessretrieve X? 
                    For example a large database will be almost solely
                    contained on a disk. The disk speed is probably a 
                    lot more critical than the CPU speed 
       3. Throughput: How many questions can be answered per second?
                    Again, in a large, multi-user data base a company 
                    might want to answer as many questions per second 
                    as possible. EX. Amazon.com
       4. Execution/Response Time: How long does it take to run this 
                    program? This is the one that we'll worry about. 
                    And it's the most common used by ASM programmers.
       5. Power Consumption - Transmeta/Ugh. 
          Hybrid: Power vs. computation, etc.

    F. Types of Comparisons: Analytical vs. Experimental
       1. Analytical - Do a rigorous analysis using a model of the 
              computer and find the exact time(s) (or ranges).
          Done when:
             a. Time constraints are VERY important 
                (Ex: controlling rocket thrusters)
             b. Easiest with simple/small program
                (Single known task)
             c. Typically time consuming/difficult, tedious task

       2. Experimental - Run similiar programs to what the machine 
               will be doing in hopes of "guestimating" performance
               (Benchmarking)
          Done when: 
             a. Analytical is not possible and time constraints aren't 
                critical
             b. Unknown workload (but can itdentify similiar workloads)

     G. Analytical Techniques
        For most of our purposes: Performance = 1/execution time
       EX:
          Machine A computes a task in 14s.
          Machine B computes the same task in 7s.
          We would say that Machine B is twice as fast as A.
          Perf_A = 1/14s, Perf_B = 1/7s. Perf_B/Perf_A = 2

       1. There are different "components" to execution time:
               User Time: The whole thing. How longs it "feels" 
                   like it takes to a user
                   (Often Called "Wall Clock" time)
               CPU Time = User CPU Time + System CPU Time
               user CPU Time: Actual time spent by CPU on YOUR task in 
                      YOUR code
               system CPU Time: Time spent by the CPU in OS tasks 
                      (Waiting for IO, etc.) for YOUR prog.
          We usually worry mainly about USER time. 
          Although most of our examples are simple enough to consider 
          user cpu time the same as user time.
          (The unix Time command corresponds to the above)

       2. System Clock Based: Performance - It's all about megahertz (or is it)
          Mega - Millions: 1,000,000; Hertz - Cycles Per second (A frequency).
          Clock Wave Looks like:
(Square Wave, Time on Axis, Up Pulse is when a switch is "on")

          Period = 1/frequency. 2ns period (2ns clock cycle) = 500MHz
Period = How long a cycle is.

          a. What's the clock used for?
             The clock is used to keep track and time the processing of 
                 asm instructions.
Example: Performing an ADD:
5 6 3 2
+ 4 3 9 9
-----------
1 0 0 3 1
When we see an instruciton (ADD), we can't instantly write
down the answer, we must work through some intermeadiate
steps to find it. A CPU must do the same thing.
In this case, the add took us 4 small steps.

             Instructions depend on the clock in vastly different ways. 
             Some Processors: All Instructions take the same number of clocks
                 Ex: All instructions 4 clocks (Add, sub, move, etc.)

             Some Processors: Different Instructions take different 
                     numbers of clocks
                 Ex: Typical MIPS: Add = 5 clocks, Load = 4 clocks

              Most Processors: Different Instructions take different 
                 numbers of clocks and may depend on memory speed/etc.
                 Ex: Modern Intel Machines

             Common Fallacy: Higher Clock Speed means faster. 
                    Reality: Although this is sometimes true, 
                             it is rarely garunteed if the clock speeds are 
                             even close to one another.
                             (The SPEC paper shows some examples)
                 I.e. when comparing a 1MHz machine to a 500MHz machine, 
                      I'd feel pretty comfortable saying that the 500MHz 
                      machine is faster (Although there really could be a 
                      few exceptions). When comparing a 550MHz machine to 
                      a 500MHz machine, I'd be a LOT more cautious - 
                      There are just too many factors affecting speed to be 
                      sure without more knowledge. We'll see this next time.

        3. CPU Time = Total CPU Cycles * Cycle Time 
           See Example on Page 60:
             1. Compute total number of clock cycles required by A
             2. Multiply by 1.2
             3. This tells us how many clock cycles must be completed in 6s, 
                so divide by 6 to determine clock rate     

        4. ISA vs. Implementation
           a.ISA: Instruction Set Architecture
             This refers to the ability to read a specific format of 1s and 0s
           I.e. an Intel and an AMD processor may have the same ISA. 
                They both can read and run the same program
           b. Implementation: The internal details of how the processor 
              accomplishes reading/running a program. 

The previous example changed just the implementation.
(In different isa's a number means differens instructions:
0x77 may mean ADD to one CPU and MULT to another)

              Ex: Intel x86 (may) take 4 clocks for an ADD
                  AMD x86 may only take 3.  
                  If they are both run at the same clock speed, 
                  the AMD will be faster. But if the Intel can 
                  be run 33% faster than the AMD, they'll be the same...

           The Intel and AMD processors have vastly different implementations
           The ONLY time we can even begin to use clock speed as a measure of 
           performance is when comparing two machines with the same ISA AND the 
           same Implementation AND all other components in the systems are 
           identical. (This last part is due to Amdahl's Law...)

 III. Next Time:
A. Continue Performance