2. CPU Architectures | Notion

<aside> ⏰

CPU performance is determined by clock speed and cycles per instruction (CPI)

</aside>

While clock speed increases with transistor density, there have been many architectural innovations that reduce CPI from many cycles per instruction to much less than 1
- Done by executing multiple instructions per cycle

Evolution

Pre 1980s: Simple Processor

No pipelining, instructions executed one after another

1980s: Pipelining

Break instruction execution into stages (like an assembly line)
While a single instruction still takes multiple cycles to complete (latency) a new instruction can start every cycle
Effect on CPI: ideally reduces CPI to 1 (one instruction completed per cycle)

Early 1990s: Superscalar

Add multiple execution units to the pipeline, allowing multiple instructions to be processed in the same stage each cycle
Effect on CPU: reduces CPU to < 1 (multiple instructions completed per cycle)

Mid 1990s: Out of Order Execution

When an instruction is stalled (i.e., waiting on data from memory), the CPU executes independent subsequent instructions that are ready to go
- This keeps the execution units busy
Effect on CPI: masks delays (like cache misses), further reducing the effective CPI

2000s: Simultaneous Multithreading

A single physical CPU core presents itself as multiple logistical cores to the OS
- Interleaves instructions from different software threads to fill empty slots in the execution units caused by stalls
Effect on CPI: Improves overall throughput by better utilizing the CPUs resources, making the “less than 1” CPI more effective across multiple threads

Key Problems and Solutions