<aside> 🐝
We should always profile code to find true bottlenecks instead of guessing what might be slow
</aside>
Clock frequency, cycles per instruction cycle, and FLOPS are misleading
Best metric is wall-clock time (seconds to completion);
$$ \text{Time = Instruction Count * CPI * Clock Period} $$
$$ \text{Speedup: } \frac{1}{(1-f)+(\frac{f}{s})} $$

f is the fraction of total execution time that the optimized part originally consumeds is the speedup factor of the optimized part (i.e., making it 2x faster means s = 2)f is 50%, the maximum speedup is only 2x| Tool Type | Examples |
|---|---|
| Software Timers | time command, gettimeofday() times() |
| Hardware Timers | rdtsc instruction (read time stamp counter) |
| Hardware Performance Counters | perf, Intel VTune |
| Instrumentation and Profiling | gprof, gcov, Valgrind |