<aside>
⏰
CPU performance is determined by clock speed and cycles per instruction (CPI)
</aside>
- While clock speed increases with transistor density, there have been many architectural innovations that reduce CPI from many cycles per instruction to much less than 1
- Done by executing multiple instructions per cycle
Evolution
Pre 1980s: Simple Processor
- No pipelining, instructions executed one after another
1980s: Pipelining
- Break instruction execution into stages (like an assembly line)
- While a single instruction still takes multiple cycles to complete (latency) a new instruction can start every cycle
- Effect on CPI: ideally reduces CPI to 1 (one instruction completed per cycle)
Early 1990s: Superscalar
- Add multiple execution units to the pipeline, allowing multiple instructions to be processed in the same stage each cycle
- Effect on CPU: reduces CPU to < 1 (multiple instructions completed per cycle)
Mid 1990s: Out of Order Execution
- When an instruction is stalled (i.e., waiting on data from memory), the CPU executes independent subsequent instructions that are ready to go
- This keeps the execution units busy
- Effect on CPI: masks delays (like cache misses), further reducing the effective CPI
2000s: Simultaneous Multithreading
- A single physical CPU core presents itself as multiple logistical cores to the OS
- Interleaves instructions from different software threads to fill empty slots in the execution units caused by stalls
- Effect on CPI: Improves overall throughput by better utilizing the CPUs resources, making the “less than 1” CPI more effective across multiple threads
Key Problems and Solutions