Day 4:
I. Last Time:
A. Finished History:
Technological evolution:
Vacuum Tubes -> Transistors -> Integrated Circuits
(SSI, MSI, LSI, VLSI)
B. Hardware Component view:
CPU - Central Processing Unit:
ALU - Arithmatic and Logic
Datapath
Registers
Control
Talking to the Computer:
Input: Mouse, Keyboard
Output: Monitor, Printer, Blinky Lights
(I/O: Network, Modem, etc.)
Memory/Storage:
Random Access: "RAM" & "Cache" - Temporary "Volatile"
storage Fast/Memory Hierarchy
Non-Random Access (Sequential): Disks, HDs, CDs, Tapes
- Non-Volatile Storage Slow/Mechanical
C. Software Hierarchy & Components
Hardware -> OS -> User Apps
D. Apps used by programmers:
Source File -> (Compiler) -> ASM File (ASM Lang) -> (Assembler) ->
Object File (Machine Lang) -> (Linker) -> Executable -> (Loader (OS)) ->
Running Program
II. New Stuff:
A. All you want to know about assembly
Why know/understand ASM:
1. Can be smaller / Faster
2. Sometimes nesc. to work with hardware
3. Can sometimes better utilize Processors by using an
instruction which the compiler would not.
4. On Some Processors, a compiler isn't available yet. (New design)
5. Sometimes need to be able to read to debug a program
6. "" "" to estimate run time of a program
The Downside:
1. Not very portable (and often "advantages" are lost if a
little portable)
2. Expansion: 1 inst in C is often 5+ insts in asm
3. Tedious/Complex.: Often only 8-20 variables to use.
(In our case, bad variable names, shared variables)
4. VERY difficult to read/understand
B. Misc:
Linking is the process of "sewing" the various pieces of a
program together:
Profiler: "Profiles" execution of code. Gives actual "timing"
data of how long parts of the code take to execute
Loader: A component of the OS which actually "loads" a file to
be run. (The file must be in the proper format for the
loader to interpret)
A loader is often responsible for verifying that the
"dynamic" parts of a program are present and "linking"
to them.
(Ever seen the windows "Missing DLL" Window?
That's because a program was dynamically linked
and the loader failed to locate something that it
needed)
C. What is Performance: Ex Fig 2.1 Pg 55: Which is best?
Plane Pass Rnge(mi) Spd Pasr Thruput (pass*speed)
Boeing 777 375 4630 610 228,750
Boeing 747 470 4150 610 286,700
Concorde 132 4000 1350 178,200
Dgls DC-8-50 146 8720 544 79,424
1. Performance is a way to answer the question: "Which is best"
2. But it also must take into account "Best for what"
D. Best in terms of what:
There are many things which computers may be measured against.
Really, the most important task is to first think about what the
machine(s) in question will be used for prior to attempting to
determine which is best. I.e. Best at WHAT?
E. Common Measures of Computer Performance:
1. Graphics/Drawing Time: How many X can be drawn per second?
(High Quality Games for example - Often this is
largely a function of the video card and memory
rather than the CPU)
2. Disk ACCESS: How long does it take to accessretrieve X?
For example a large database will be almost solely
contained on a disk. The disk speed is probably a
lot more critical than the CPU speed
3. Throughput: How many questions can be answered per second?
Again, in a large, multi-user data base a company
might want to answer as many questions per second
as possible. EX. Amazon.com
4. Execution/Response Time: How long does it take to run this
program? This is the one that we'll worry about.
And it's the most common used by ASM programmers.
5. Power Consumption - Transmeta/Ugh.
Hybrid: Power vs. computation, etc.
F. Types of Comparisons: Analytical vs. Experimental
1. Analytical - Do a rigorous analysis using a model of the
computer and find the exact time(s) (or ranges).
Done when:
a. Time constraints are VERY important
(Ex: controlling rocket thrusters)
b. Easiest with simple/small program
(Single known task)
c. Typically time consuming/difficult, tedious task
2. Experimental - Run similiar programs to what the machine
will be doing in hopes of "guestimating" performance
(Benchmarking)
Done when:
a. Analytical is not possible and time constraints aren't
critical
b. Unknown workload (but can itdentify similiar workloads)
G. Analytical Techniques
For most of our purposes: Performance = 1/execution time
EX:
Machine A computes a task in 14s.
Machine B computes the same task in 7s.
We would say that Machine B is twice as fast as A.
Perf_A = 1/14s, Perf_B = 1/7s. Perf_B/Perf_A = 2
1. There are different "components" to execution time:
User Time: The whole thing. How longs it "feels"
like it takes to a user
(Often Called "Wall Clock" time)
CPU Time = User CPU Time + System CPU Time
user CPU Time: Actual time spent by CPU on YOUR task in
YOUR code
system CPU Time: Time spent by the CPU in OS tasks
(Waiting for IO, etc.) for YOUR prog.
We usually worry mainly about USER time.
Although most of our examples are simple enough to consider
user cpu time the same as user time.
(The unix Time command corresponds to the above)
2. System Clock Based: Performance - It's all about megahertz (or is it)
Mega - Millions: 1,000,000; Hertz - Cycles Per second (A frequency).
Clock Wave Looks like:
(Square Wave, Time on Axis, Up Pulse is when a switch is "on")
Period = 1/frequency. 2ns period (2ns clock cycle) = 500MHz
Period = How long a cycle is.
a. What's the clock used for?
The clock is used to keep track and time the processing of
asm instructions.
Example: Performing an ADD:
5 6 3 2
+ 4 3 9 9
-----------
1 0 0 3 1
When we see an instruciton (ADD), we can't instantly write
down the answer, we must work through some intermeadiate
steps to find it. A CPU must do the same thing.
In this case, the add took us 4 small steps.
Instructions depend on the clock in vastly different ways.
Some Processors: All Instructions take the same number of clocks
Ex: All instructions 4 clocks (Add, sub, move, etc.)
Some Processors: Different Instructions take different
numbers of clocks
Ex: Typical MIPS: Add = 5 clocks, Load = 4 clocks
Most Processors: Different Instructions take different
numbers of clocks and may depend on memory speed/etc.
Ex: Modern Intel Machines
Common Fallacy: Higher Clock Speed means faster.
Reality: Although this is sometimes true,
it is rarely garunteed if the clock speeds are
even close to one another.
(The SPEC paper shows some examples)
I.e. when comparing a 1MHz machine to a 500MHz machine,
I'd feel pretty comfortable saying that the 500MHz
machine is faster (Although there really could be a
few exceptions). When comparing a 550MHz machine to
a 500MHz machine, I'd be a LOT more cautious -
There are just too many factors affecting speed to be
sure without more knowledge. We'll see this next time.
3. CPU Time = Total CPU Cycles * Cycle Time
See Example on Page 60:
1. Compute total number of clock cycles required by A
2. Multiply by 1.2
3. This tells us how many clock cycles must be completed in 6s,
so divide by 6 to determine clock rate
4. ISA vs. Implementation
a.ISA: Instruction Set Architecture
This refers to the ability to read a specific format of 1s and 0s
I.e. an Intel and an AMD processor may have the same ISA.
They both can read and run the same program
b. Implementation: The internal details of how the processor
accomplishes reading/running a program.
The previous example changed just the implementation.
(In different isa's a number means differens instructions:
0x77 may mean ADD to one CPU and MULT to another)
Ex: Intel x86 (may) take 4 clocks for an ADD
AMD x86 may only take 3.
If they are both run at the same clock speed,
the AMD will be faster. But if the Intel can
be run 33% faster than the AMD, they'll be the same...
The Intel and AMD processors have vastly different implementations
The ONLY time we can even begin to use clock speed as a measure of
performance is when comparing two machines with the same ISA AND the
same Implementation AND all other components in the systems are
identical. (This last part is due to Amdahl's Law...)
III. Next Time:
A. Continue Performance