# Lecture 4 – Debugging and Profiling

### Overview

There is a golden rule in programming: **code does not do what you expect it to do — it does what you tell it to do**. Bridging that gap is the entire discipline of debugging. And once the code is correct, there is a second discipline: making sure it doesn't consume more time, CPU, or memory than it needs to. That is profiling.

This lecture covers the full spectrum from the simplest technique (print statements) to powerful systems-level tools (`strace`, `rr`, `perf`, eBPF), along with memory error detection, CPU flame graphs, and benchmarking. Understanding when and how to reach for each tool is the core skill this lecture develops.

#### Key Takeaways

* **Print debugging** is fast and often enough. **Logging** is print debugging done properly — with severity levels, structured output, and zero code removal required.
* **Debuggers** (gdb, lldb, pdb) let you pause, step through, and inspect a program's execution state at any point. Use them when you don't yet know what to print.
* **Record-replay debugging** (`rr`) records a run deterministically and lets you replay it — including *backwards* — to find corruption and race conditions you cannot reproduce with print statements.
* **System call tracing** (`strace`, `bpftrace`) reveals how your program interacts with the OS kernel — what files it opens, what network calls it makes, where it hangs.
* **Memory sanitizers** (ASan, TSan, UBSan) instrument binaries at compile time to catch buffer overflows, use-after-free, data races, and undefined behavior — often catching bugs that don't crash immediately.
* **Wall clock time is misleading**. Use `time` to distinguish Real, User, and Sys time. Use `perf` and flame graphs to find the hot spots.
* **Profile before optimizing**. Premature optimization is the root of all evil — a profiler tells you *where* to spend effort. Intuition about hot spots is usually wrong.
* **Flame graphs** make profiling data readable. The width of a bar shows how much time was spent in that function (and everything it called).
* **`hyperfine`** is the right tool for benchmarking CLI programs — it handles warmup, statistical runs, and produces a human-readable comparison.

***

### Core Concepts

#### Print Debugging and Logging

The oldest debugging technique is also the most universally applicable: observe what the program is actually doing by making it tell you.

**Printf debugging** means inserting temporary `print` statements to trace execution flow and inspect values:

```python
def process_order(order):
    print(f"[DEBUG] order received: {order}")        # ← added
    total = calculate_total(order.items)
    print(f"[DEBUG] total calculated: {total}")      # ← added
    apply_discount(order, total)
    print(f"[DEBUG] after discount: {order.total}")  # ← added
```

This is fast, requires no tools, and works in any environment. Its limitation: you must know what to print in advance, and you must remove all those statements afterward.

**Logging** is printf debugging done right. Instead of transient print statements, you use a logging framework that provides:

| Feature                       | What it gives you                                                                         |
| ----------------------------- | ----------------------------------------------------------------------------------------- |
| **Severity levels**           | `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` — filter at runtime without changing code |
| **Structured output**         | JSON-formatted log entries that tools can parse, index, and query                         |
| **Configurable destinations** | Console, file, syslog, remote log aggregation                                             |
| **Persistent value**          | Logs stay in the codebase; future bugs may already be logged                              |

```python
import logging

logging.basicConfig(level=logging.DEBUG, format='%(levelname)s %(name)s: %(message)s')
logger = logging.getLogger(__name__)

def process_order(order):
    logger.debug("order received: %s", order)
    total = calculate_total(order.items)
    logger.info("order %s processed, total: %.2f", order.id, total)
```

> **The key habit:** When you've found and fixed a bug using print statements, don't just delete them. Convert them to proper `logger.debug()` calls. If a similar bug appears in the future, the diagnostic is already in place.

**Reading third-party logs:**

```bash
# System logs (Linux, systemd)
journalctl -u nginx              # logs for a specific service
journalctl -f                    # follow live (like tail -f)
journalctl --since "1 hour ago"  # time-filtered

# Application logs
tail -f /var/log/postgresql/postgresql.log
cat /var/log/nginx/error.log | grep "upstream"

# Verbose mode (many CLI tools support this)
curl -v https://example.com
ssh -vvv user@server              # triple-v for maximum detail
```

***

#### Interactive Debuggers

When you don't know what to print — when the bug is in code you don't yet understand, or only manifests in specific conditions — an **interactive debugger** is the right tool. Rather than running the program to completion, a debugger lets you **pause execution, inspect state, and step forward one instruction at a time**.

The core workflow in any debugger:

```
1. Set a breakpoint at a suspicious location
2. Run the program — it pauses when it hits the breakpoint
3. Inspect variables, the call stack, memory
4. Step forward: one line, one function call, or until the next breakpoint
5. Repeat until the bug is found
```

**GDB** (GNU Debugger) is the standard debugger for C, C++, and Rust on Linux:

| Command       | Action                                               |
| ------------- | ---------------------------------------------------- |
| `run`         | Start the program                                    |
| `b main`      | Set breakpoint at function `main`                    |
| `b file.c:42` | Set breakpoint at line 42 of file.c                  |
| `c`           | Continue execution until next breakpoint             |
| `next` / `n`  | Step over (execute current line, stay at same level) |
| `step` / `s`  | Step into (enter function calls)                     |
| `finish`      | Run until current function returns                   |
| `p variable`  | Print value of variable                              |
| `bt`          | Print backtrace (full call stack)                    |
| `watch expr`  | Break when the value of `expr` changes               |
| `info locals` | Print all local variables                            |
| `quit`        | Exit GDB                                             |

```bash
# Compile with debug symbols (essential for meaningful GDB output)
gcc -g -O0 program.c -o program

# Start GDB
gdb ./program

# Or use TUI mode for a split source/command view
gdb -tui ./program
```

**Python's `pdb`** — the equivalent for Python:

```python
import pdb

def buggy_function(data):
    result = []
    for item in data:
        pdb.set_trace()        # execution pauses here
        processed = transform(item)
        result.append(processed)
    return result
```

Or launch from the command line:

```bash
python -m pdb myscript.py
```

Inside `pdb`, the commands are similar to GDB: `n` (next), `s` (step), `c` (continue), `p variable` (print), `bt` (backtrace), `q` (quit).

**IDE debuggers** in VS Code wrap the same concepts in a GUI:

* Click the gutter to set a breakpoint (red dot)
* Press F5 to start debugging
* Use the Debug panel to inspect variables, the call stack, and watch expressions
* Step Over (F10), Step Into (F11), Continue (F5)

***

#### Record-Replay Debugging with `rr`

Some bugs are **Heisenbugs** — they disappear or change when you try to observe them. Race conditions, timing-dependent bugs, and memory corruption that manifests far from its source fall into this category. Traditional debugging fails here because running the program again produces different behavior.

**`rr`** (Mozilla's record-replay debugger) solves this by recording an entire program execution and replaying it **deterministically**. The same execution — same memory addresses, same thread scheduling, same system calls — can be replayed as many times as needed, including **in reverse**.

```bash
# Record a program run
rr record ./my_program

# Replay it (opens a GDB-like interface)
rr replay
```

**Reverse debugging commands** (available during replay):

| Command                   | Action                                            |
| ------------------------- | ------------------------------------------------- |
| `rc` / `reverse-continue` | Run backwards until the previous breakpoint       |
| `rs` / `reverse-step`     | Step backwards one line                           |
| `rn` / `reverse-next`     | Step backwards, skipping function internals       |
| `reverse-finish`          | Run backwards until entering the current function |

**The killer workflow** — finding memory corruption:

```
1. rr record ./program           → capture the crashing run
2. rr replay                     → replay deterministically
3. Run to the crash              → examine the corrupted value
4. watch students[1].id          → set a watchpoint on the corrupted variable
5. reverse-continue              → run backwards to find who wrote it
6. Execution stops at the exact line that performed the corruption
```

This is impossible with traditional debugging — you'd have to guess where the corruption happens. With `rr`, you find it exactly.

> `rr` works on Linux only and requires hardware performance counters. It doesn't run in most VMs or AWS EC2 instances, and doesn't support GPU programs. For macOS, check out [Warpspeed](https://warpspeed.dev/).

***

#### System Call Tracing

When a program behaves unexpectedly and you can't easily modify its source, **system call tracing** shows you exactly how it talks to the operating system: what files it opens, what network connections it makes, what processes it spawns.

**`strace`** (Linux) intercepts and logs every system call:

```bash
# Trace all system calls
strace ./my_program

# Trace only file-related calls (cleaner output)
strace -e trace=file ./my_program

# Attach to a running process by PID
strace -p 12345

# Follow child processes spawned by the program
strace -f ./my_program

# Show timing: how long each call takes
strace -T ./my_program
```

Common uses: finding which config file a program is failing to read, understanding why a program hangs (usually waiting on a read/write), tracing unexpected network calls.

**`bpftrace`** — for when you need more power:

`strace` works per-process. `bpftrace` uses **eBPF** to instrument the kernel itself, allowing system-wide tracing, aggregation, and filtering with very low overhead:

```bash
# Watch all file opens system-wide (every process)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

# Count system calls, grouped by name (Ctrl-C to print summary)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }'

# Filter to one specific program
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* /comm == "nginx"/ { @[probe] = count(); }'

# Trace a command from startup (cpid = child PID of the launched command)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* /pid == cpid/ { @[probe] = count(); }' -c 'ls -la'
```

**Network debugging:**

```bash
# Capture all HTTP traffic
sudo tcpdump -i any port 80

# Save to file for Wireshark analysis
sudo tcpdump -i any -w capture.pcap

# Find what process is using a port
ss -tlnp | grep :8080
```

For HTTPS: use [mitmproxy](https://mitmproxy.org/) as an intercepting proxy, or simply use browser DevTools → Network tab (which shows decrypted traffic).

***

#### Memory Debugging

Memory bugs — buffer overflows, use-after-free, leaks — are among the most dangerous in systems programming. They often don't crash immediately; instead, they silently corrupt memory that only fails much later, far from the actual bug.

**Sanitizers** are compiler-level instruments that detect errors at runtime. They require recompilation but are fast enough for CI and daily development:

| Sanitizer                      | Flag                   | Detects                                         |
| ------------------------------ | ---------------------- | ----------------------------------------------- |
| **AddressSanitizer**           | `-fsanitize=address`   | Buffer overflow, use-after-free, memory leaks   |
| **ThreadSanitizer**            | `-fsanitize=thread`    | Data races in multithreaded code                |
| **MemorySanitizer**            | `-fsanitize=memory`    | Reads of uninitialized memory                   |
| **UndefinedBehaviorSanitizer** | `-fsanitize=undefined` | Integer overflow, null deref, misaligned access |

```bash
# Compile with AddressSanitizer
gcc -fsanitize=address -g program.c -o program
./program
# ASan prints a detailed error report if a memory bug is triggered
```

**Valgrind** — when you can't recompile:

Valgrind runs your program in a virtual machine and monitors every memory access. Slower than sanitizers (10-50x), but requires no source code:

```bash
valgrind --leak-check=full ./my_program

# Profiling tool (covered in Profiling section)
valgrind --tool=callgrind ./my_program
valgrind --tool=massif ./my_program
```

***

#### AI as a Debugging Assistant

LLMs have become genuinely useful for specific debugging tasks that are awkward with traditional tools:

| Task                          | Why LLMs help                                                                                                                             |
| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **Cryptic compiler errors**   | C++ template errors and Rust borrow-checker messages can span dozens of lines. LLMs translate them to plain English and suggest fixes.    |
| **Cross-language debugging**  | Bugs that span a Python binding, a C library, and a build system are hard to reason about. LLMs can navigate multiple abstraction layers. |
| **Stack trace analysis**      | Paste a crash stack trace → LLM explains likely cause and points to the relevant code.                                                    |
| **Vague symptom correlation** | "Uses 10x more memory than expected" → LLM suggests likely causes and what to look for.                                                   |

**Limitations:**

* LLMs can hallucinate confident but wrong explanations
* They may suggest a patch that masks the bug rather than fixing it
* Always verify suggestions with actual tools and tests
* Best used as a complement to debugging tools, not a replacement

***

#### Timing and Resource Monitoring

Before profiling code, understand what resource is actually constrained.

**`time` — distinguish real from CPU time:**

```bash
time curl https://missing.csail.mit.edu &> /dev/null
# real    0m0.272s   ← wall clock (includes waiting)
# user    0m0.079s   ← CPU time in your code
# sys     0m0.028s   ← CPU time in kernel on your behalf
```

If `real >> user + sys`, the program is waiting (network, disk, locks). If `real ≈ user`, it's CPU-bound.

**Monitoring tools:**

| Tool       | What it shows                                                     |
| ---------- | ----------------------------------------------------------------- |
| `htop`     | Real-time CPU/memory per process; `F6` to sort, `t` for tree view |
| `btop`     | Modern alternative to htop with more metrics                      |
| `iotop`    | Disk I/O per process in real time                                 |
| `free -h`  | Total and available RAM                                           |
| `lsof`     | Which files/sockets a process has open                            |
| `ss -tlnp` | Network connections and which process owns them                   |
| `nethogs`  | Network bandwidth per process                                     |

***

#### CPU Profiling

Profiling tells you **where time is actually spent** — which functions are hot, which call chains are expensive. Without this, optimization is guesswork.

Two types of CPU profilers:

| Type         | How it works                                          | Overhead | Best for                          |
| ------------ | ----------------------------------------------------- | -------- | --------------------------------- |
| **Sampling** | Interrupts program \~1000x/sec, records current stack | Low      | Production, general use           |
| **Tracing**  | Records every function call and return                | High     | Exact call counts, small programs |

**`perf`** — the standard Linux sampling profiler:

```bash
# Quick stats overview
perf stat ./slow_program

# Record a profile
perf record -g ./slow_program

# Analyze interactively
perf report

# Generate a flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
```

**Flame graphs** are the visualization for profiling data. Reading one:

* **Y-axis:** Call stack depth (bottom = entry point, top = leaf functions)
* **X-axis:** Time proportion (wider = more CPU time spent here)
* **Colour:** Usually arbitrary grouping (not significance)
* **Interaction:** Click a bar to zoom in; the tooltip shows the exact function

If a bar at the top of the flame is wide, that function itself is slow. If a bar in the middle is wide, that function is spending time calling things below it.

**Callgrind** (Valgrind) — exact tracing profiler:

```bash
valgrind --tool=callgrind ./my_program
callgrind_annotate callgrind.out.<pid>   # text output
kcachegrind callgrind.out.<pid>          # GUI output
```

Use Callgrind when you need exact call counts (not statistical approximations), or when you want to simulate cache behavior with `--cache-sim=yes`.

***

#### Memory Profiling

**Massif** (Valgrind) tracks heap memory over time:

```bash
valgrind --tool=massif ./my_program
ms_print massif.out.<pid>
```

Output shows a graph of heap allocation over time — useful for finding memory that grows unboundedly (a leak) or excessive peak allocation.

***

#### Benchmarking with `hyperfine`

When comparing implementations or tools, `hyperfine` runs each command multiple times, handles warmup, and produces statistically sound comparison:

```bash
hyperfine --warmup 3 'fd -e jpg' 'find . -iname "*.jpg"'
# Output:
# Benchmark #1: fd -e jpg
#   Time (mean ± σ): 51.4 ms ± 2.9 ms
# Benchmark #2: find . -iname "*.jpg"
#   Time (mean ± σ): 1.126 s ± 0.101 s
# Summary: 'fd -e jpg' ran 21.89× faster
```

***

### Mental Model

#### Debugging as Hypothesis Testing

Every debugging session is a scientific experiment:

```
Observe anomaly
      ↓
Form hypothesis: "The bug is in function X because Y"
      ↓
Design test: set a breakpoint / add a print / run with sanitizer
      ↓
Execute test
      ↓
Does result confirm or refute hypothesis?
      ↓
Refine hypothesis and repeat, OR fix confirmed bug
```

The difference between a novice and an expert debugger is mostly the quality and speed of hypothesis formation. Experts form hypotheses that bisect the search space — each test eliminates half the possible locations for the bug.

***

#### The Debugging Tool Ladder

Choose the simplest tool that can answer your current question:

```
Question: "What is the value of X at this point?"
→ Print statement or logger.debug()

Question: "What is the execution path that leads to this state?"
→ Interactive debugger (gdb, pdb, IDE)

Question: "How did this memory get corrupted 3 function calls ago?"
→ rr (record-replay + reverse debugging)

Question: "What OS-level operations is this program performing?"
→ strace / bpftrace

Question: "Is there a memory error? Where exactly?"
→ AddressSanitizer / Valgrind

Question: "Is there a data race in this multithreaded code?"
→ ThreadSanitizer
```

Reaching for `strace` when a print statement would do wastes time. Reaching for print statements when `rr` is needed wastes more time. Calibrate the tool to the question.

***

#### Profiling: Measure First, Optimize Second

The canonical trap is **premature optimization**: spending effort speeding up code that isn't the bottleneck. Profiling destroys this trap because it shows you the real hot spots.

```
Intuition says:   "The bottleneck is the JSON parsing"
Profiler says:    "90% of time is in a regex called 10,000 times inside a loop
                   that could be compiled once outside the loop"
```

The mental model for flame graphs: imagine running the program and at a random moment taking a snapshot of the call stack. The flame graph shows the probability distribution of those snapshots. A wide bar = high probability of being on the stack = lots of time spent there.

***

#### Real vs. User vs. Sys Time

```
Program runs for 5 seconds (real time)
├── 0.5s waiting for network responses  (neither user nor sys)
├── 3.0s executing your code            (user time)
└── 1.5s kernel executing on your behalf (sys time)

time output:
  real  5m0.000s
  user  3m0.000s
  sys   1m5.000s
```

If `real >> user + sys` → waiting (I/O bound, network bound): profiling won't help much, you need to reduce waiting.

If `real ≈ user` → CPU bound: profiling will find the hot functions.

If `sys` is disproportionately high → excessive system calls (opening too many files, too many small allocations): `strace -c` will show you the syscall breakdown.

***

### Commands and Syntax

#### Command: `gdb`

```bash
gdb ./program             # start debugger
gdb -tui ./program        # TUI mode (split source + commands)
gdb --args ./prog a b c   # pass arguments to the program
```

Essential GDB commands (at the `(gdb)` prompt):

```gdb
run               # start the program
run arg1 arg2     # start with arguments
b main            # breakpoint at function main
b file.c:42       # breakpoint at line 42 of file.c
c                 # continue until next breakpoint
n                 # next (step over)
s                 # step (step into)
finish            # run until function returns
p variable        # print variable value
p *ptr            # dereference and print pointer
bt                # backtrace (call stack)
info locals       # print all local variables
watch expr        # break when expr changes value
x/10x $sp         # examine 10 hex words at stack pointer
q                 # quit
```

***

#### Command: `rr`

```bash
rr record ./program          # record execution
rr record ./program arg1     # record with arguments
rr replay                    # replay in GDB interface
rr replay -d gdb             # explicitly use GDB
```

During `rr replay` (in addition to all normal GDB commands):

```gdb
rc               # reverse-continue (run backward to prev breakpoint)
rs               # reverse-step
rn               # reverse-next
reverse-finish   # run backward until entering current function
```

***

#### Command: `strace`

```bash
strace ./program                    # trace all syscalls
strace -e trace=file ./program      # only file-related calls
strace -e trace=network ./program   # only network calls
strace -f ./program                 # follow child processes
strace -p 12345                     # attach to running PID
strace -T ./program                 # show time spent in each call
strace -c ./program                 # summary statistics only
strace -o trace.txt ./program       # save output to file
```

***

#### Command: `perf`

```bash
perf stat ./program              # performance counter summary
perf record -g ./program         # record with call graphs
perf report                      # interactive analysis of recording
perf script                      # raw output (for flame graph generation)
perf top                         # live, top-like view of CPU usage by function
```

***

#### Command: `valgrind`

```bash
valgrind ./program                          # basic memory error check
valgrind --leak-check=full ./program        # detailed leak info
valgrind --tool=callgrind ./program         # CPU tracing profiler
valgrind --tool=massif ./program            # heap memory profiler
valgrind --tool=helgrind ./program          # thread error detector
callgrind_annotate callgrind.out.<pid>      # analyze callgrind output
ms_print massif.out.<pid>                   # analyze massif output
```

***

#### Command: `hyperfine`

```bash
hyperfine 'cmd1' 'cmd2'                    # basic comparison
hyperfine --warmup 5 'cmd1' 'cmd2'         # with warmup runs
hyperfine --runs 20 'cmd1'                 # specific number of runs
hyperfine --export-markdown bench.md 'cmd1' 'cmd2'  # export results
hyperfine --parameter-scan n 1 10 'program --count {n}'  # parameter sweep
```

***

#### Command: `time`

```bash
time ./program               # basic timing
time sleep 1                 # test: should show ~1s real, ~0s user
/usr/bin/time -v ./program   # verbose: includes memory, page faults
```

***

#### Command: `ss`

```bash
ss -tlnp             # all TCP listening sockets with process info
ss -tlnp | grep 8080 # find what's on port 8080
ss -s                # socket statistics summary
```

***

#### Sanitizer Flags Reference

```bash
# AddressSanitizer
gcc -fsanitize=address -g program.c -o program

# ThreadSanitizer
gcc -fsanitize=thread -g program.c -o program

# MemorySanitizer (Clang only)
clang -fsanitize=memory -g program.c -o program

# UndefinedBehaviorSanitizer
gcc -fsanitize=undefined -g program.c -o program

# Combine multiple (address + undefined is common)
gcc -fsanitize=address,undefined -g program.c -o program

# Python equivalent: PYTHONFAULTHANDLER for crash tracebacks
PYTHONFAULTHANDLER=1 python myscript.py
```

***

### Command Flow Diagrams

#### Debugging Tool Selection Flowchart

```mermaid
graph TD
    START["Bug observed"] --> Q1{"Do you know\nwhat to print?"}

    Q1 -->|Yes| PRINT["Printf / logger.debug()"]
    Q1 -->|No| Q2{"Can you modify\nand rerun easily?"}

    Q2 -->|Yes| DBG["Interactive debugger\ngdb / pdb / IDE"]
    Q2 -->|No| Q3{"Is it a memory\ncorruption bug?"}

    Q3 -->|Yes| Q4{"Can you\nrecompile?"}
    Q3 -->|No| Q5{"Is it a race\ncondition / flaky?"}

    Q4 -->|Yes| ASAN["AddressSanitizer\n-fsanitize=address"]
    Q4 -->|No| VAL["Valgrind\n--leak-check=full"]

    Q5 -->|Yes| RR["rr record-replay\n+ reverse debugging"]
    Q5 -->|No| Q6{"OS-level\nbehavior?"}

    Q6 -->|Yes| STRACE["strace / bpftrace"]
    Q6 -->|No| LLM["Ask an LLM to explain\nthe error / stack trace"]

    style PRINT fill:#2d4a22,color:#fff
    style DBG fill:#1a3a5c,color:#fff
    style ASAN fill:#4a2d00,color:#fff
    style VAL fill:#4a2d00,color:#fff
    style RR fill:#4a1a5c,color:#fff
    style STRACE fill:#3a3a00,color:#fff
    style LLM fill:#2d2d2d,color:#fff
```

***

#### Profiling Tool Selection Flowchart

```mermaid
graph TD
    SLOW["Program is slow"] --> M1{"What resource\nis constrained?"}

    M1 -->|"Real >> User+Sys\n(I/O or network wait)"| IO["Check I/O: iotop\nCheck network: nethogs\nCheck syscalls: strace -c"]
    M1 -->|"Real ≈ User\n(CPU bound)"| CPU["CPU Profiler"]
    M1 -->|"High memory"| MEM["Memory Profiler\nValgrind Massif"]
    M1 -->|"Not sure"| MON["htop / btop\nto observe resources"]

    CPU --> Q2{"Production\nor development?"}
    Q2 -->|Production| PERF["perf record -g\n+ flame graph"]
    Q2 -->|Development| Q3{"Need exact\ncall counts?"}
    Q3 -->|Yes| CGRIND["Valgrind Callgrind\n+ kcachegrind"]
    Q3 -->|No| PERF

    PERF --> FLAME["Generate flame graph\nIdentify wide bars"]
    FLAME --> OPT["Optimize the hot path"]

    style IO fill:#1a3a5c,color:#fff
    style CPU fill:#2d4a22,color:#fff
    style MEM fill:#4a2d00,color:#fff
    style PERF fill:#4a1a5c,color:#fff
    style FLAME fill:#3a3a00,color:#fff
```

***

#### How rr Record-Replay Works

```mermaid
sequenceDiagram
    participant User
    participant rr
    participant Program
    participant Kernel

    User->>rr: rr record ./program
    rr->>Program: Launch with interception
    Program->>Kernel: syscall: read file
    Kernel-->>rr: intercept + log response + timestamp
    rr-->>Program: pass through response
    Note over rr: Records: syscall results,\ntiming, thread schedule

    User->>rr: rr replay
    rr->>Program: Replay from recording
    Program->>Kernel: syscall: read file
    rr-->>Program: replay recorded response\n(deterministic)

    User->>rr: reverse-continue
    rr->>Program: Rewind execution\nto previous state
    Note over User,Program: Can go forwards AND\nbackwards through same\ndeterministic execution
```

***

#### Memory Bug Timeline: Why They're Hard

```mermaid
graph LR
    A["Corruption happens\nhere (e.g., buffer overflow)"]
    B["Memory sits\ncorrupted silently"]
    C["Unrelated code reads\nthe corrupted memory"]
    D["Crash or wrong\noutput HERE"]

    A -->|"time passes"| B
    B -->|"more time"| C
    C --> D

    style A fill:#4a2d00,color:#fff
    style D fill:#4a0000,color:#fff

    NOTE["Traditional debugger starts\nat D and must find A.\nrr starts at D,\nreverses to A."]
```

***

#### Flame Graph Anatomy

```mermaid
graph TB
    subgraph "Flame Graph (read bottom to top, width = time)"
        ROOT["main() ██████████████████████████████████████ 100%"]
        A["process_request() ███████████████████████████ 78%"]
        B["handle_error() ████████ 22%"]
        C["parse_json() ████████████████████ 55%"]
        D["validate() ██████ 20%"]
        E["log_error() ████ 10%"]
        F["json_decode() ██████████████ 40%"]  
        G["regex_match() ████████████ 35%"]

        ROOT --> A
        ROOT --> B
        A --> C
        A --> D
        B --> E
        C --> F
        C --> G
    end

    NOTE["regex_match() is WIDE\n→ this is the hot spot\n→ optimize here first"]

    style G fill:#ff4444,color:#fff
    style NOTE fill:#2d4a22,color:#fff
```

***

### Command Pipeline Examples

#### Find What Process Is Using a Port

```bash
ss -tlnp | grep :8080
```

| Step         | What it does                                      |
| ------------ | ------------------------------------------------- |
| `ss -tlnp`   | Lists all TCP listening sockets with process info |
| `grep :8080` | Filters to the line matching port 8080            |

Output: `LISTEN 0 128 *:8080 *:* users:(("node",pid=1234,fd=10))`

Kill it:

```bash
kill $(ss -tlnp | grep ':8080' | awk '{print $6}' | grep -oP 'pid=\K\d+')
```

***

#### Profile, Visualize, and Fix

```bash
# 1. Time the program to understand what resource is constrained
time ./slow_program
# real 12.3s, user 12.1s → CPU bound

# 2. Record a profile
perf record -g ./slow_program

# 3. Quick text report
perf report

# 4. Generate a flame graph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
open flamegraph.svg   # or firefox flamegraph.svg

# 5. After optimizing, benchmark the improvement
hyperfine --warmup 3 './slow_program' './optimized_program'
```

***

#### Find Memory Leak with Valgrind Massif

```bash
# Run with massif profiler
valgrind --tool=massif --pages-as-heap=yes ./server &
SERVER_PID=$!

# Generate some load
curl http://localhost:8080/api/endpoint
sleep 5
curl http://localhost:8080/api/endpoint

# Stop the server
kill $SERVER_PID

# Analyze memory over time
ms_print massif.out.$SERVER_PID | head -60
# Look for heap total growing without shrinking → leak
```

***

#### Trace File Access of an Unknown Binary

```bash
# What files does this program try to open?
strace -e trace=openat ./mystery_program 2>&1 | grep -v "= -1"
# -1 return codes = file not found; filter those out to see what it successfully opens

# Or use bpftrace for lower overhead, system-wide
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "mystery_program"/ { printf("%s\n", str(args->filename)); }'
```

***

### Real World Workflows

#### Debugging a Production Crash with Core Dumps

```bash
# Enable core dumps (set limit to unlimited)
ulimit -c unlimited

# Run the program — it crashes and writes core file
./my_service
# → Segmentation fault (core dumped)

# Analyze with GDB
gdb ./my_service core
# Inside GDB:
bt              # see the full stack trace at crash time
p *this         # inspect the object that caused the crash
info locals     # see all local variables in the crashing frame
frame 3         # switch to frame 3 in the stack
```

***

#### Catch a Race Condition with ThreadSanitizer

```bash
# Compile with TSan
g++ -fsanitize=thread -g -O1 server.cpp -o server -lpthread

# Run normally — TSan detects races and prints a report
./server

# TSan output:
# WARNING: ThreadSanitizer: data race (pid=1234)
#   Write of size 8 at 0x... by thread T2:
#     #0 update_counter server.cpp:45
#   Previous read of size 8 at 0x... by thread T1:
#     #0 read_counter server.cpp:30
# → Race between update_counter and read_counter on same variable
```

***

#### Compare `grep` vs `ripgrep` Performance

```bash
# Install ripgrep
brew install ripgrep   # macOS
apt install ripgrep    # Ubuntu

# Benchmark on a large codebase
hyperfine \
  --warmup 3 \
  "grep -r 'TODO' /large/codebase" \
  "rg 'TODO' /large/codebase"

# Expected result: rg is typically 5-20x faster on large directories
```

***

#### Add Structured Logging for Future Debugging

```python
import logging
import json
from datetime import datetime

# Set up JSON structured logging
class JSONFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            "timestamp": datetime.utcnow().isoformat(),
            "level": record.levelname,
            "module": record.module,
            "message": record.getMessage(),
            # add any extra fields passed to the logger
            **getattr(record, "extra", {})
        })

handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger("myapp")
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)

# Usage — log with structured context
logger.info("order processed", extra={"order_id": 42, "total": 99.95, "user_id": 7})
# → {"timestamp": "2024-01-15T10:30:00", "level": "INFO", "message": "order processed",
#    "order_id": 42, "total": 99.95, "user_id": 7}
```

Now logs can be queried and aggregated: `cat app.log | jq 'select(.level == "ERROR")'`

***

### Productivity Tricks

#### GDB Pretty Printing and TUI Mode

```bash
# TUI mode: source code on top, GDB commands on bottom
gdb -tui ./program
# Or toggle inside GDB:
Ctrl-x a      # toggle TUI
Ctrl-x 2      # split: source + assembly views

# Python pretty printers (GDB extension)
# Add to ~/.gdbinit:
python
import gdb.printing
end
# Then print STL containers readably: p myVector shows actual elements
```

***

#### One-Line `strace` Summary

```bash
# Count syscalls and time spent in each (great starting point)
strace -c ./program
# Output:
# % time     seconds  usecs/call     calls    errors syscall
# ------ ----------- ----------- --------- --------- --------
#  58.32    0.001456          14       100           read
#  23.41    0.000584           5       120           write
```

This shows you exactly which syscalls dominate — often reveals surprising I/O patterns.

***

#### Python-Specific Profiling

```python
# Built-in cProfile (tracing profiler)
python -m cProfile -s cumulative myscript.py | head -20

# py-spy (sampling profiler, zero overhead, attaches to running process)
# pip install py-spy
py-spy top --pid 12345          # live top-like view
py-spy record -o profile.svg --pid 12345  # flame graph
py-spy record -o profile.svg -- python myscript.py  # from start
```

***

#### Watchpoints: The "Who Changed This?" Superpower

In GDB (and rr), a **watchpoint** halts execution whenever a memory location is read or written:

```gdb
(gdb) watch students[1].id      # break when students[1].id is written
(gdb) rwatch students[1].id     # break when students[1].id is read
(gdb) awatch students[1].id     # break on read OR write
(gdb) c                         # continue — halts at the write
```

Combined with `rr`'s `reverse-continue`, this is the most powerful technique for finding memory corruption: run to the corrupted state, set a watchpoint, reverse-continue to find the writer.

***

### Common Mistakes

#### ❌ Optimizing Before Profiling

**Wrong:**

```python
# "I bet the bottleneck is the JSON parsing"
# Rewrites the JSON parser to be 3x faster
# Result: 2% total speedup because JSON was 5% of runtime
```

**Correct:**

```bash
# Profile first
py-spy record -o profile.svg -- python app.py
# Flame graph shows: 80% of time in a quadratic string concatenation loop
# Fix that loop → 5x overall speedup
```

**Why:** Intuitions about hot spots are almost always wrong. The profiler is the only ground truth.

***

#### ❌ Trusting Wall Clock Time Alone

**Wrong:**

```bash
# "My program takes 10 seconds — must be computation"
time ./program
# real 10.0s, user 0.3s, sys 0.1s
```

In this case, 9.6 seconds is spent **waiting** (network, disk, locks). CPU profiling would show nothing useful. The real fix is reducing the waiting.

**Correct:** Always look at `real` vs `user + sys`. If they diverge, your bottleneck is waiting, not computation.

***

#### ❌ Compiling Without Debug Symbols

**Wrong:**

```bash
gcc -O2 program.c -o program
gdb ./program
# (gdb) bt
# → #0  0x00000000004005a2 in ??  (just memory addresses, no function names)
```

**Correct:**

```bash
gcc -g -O0 program.c -o program   # debug symbols, no optimization
gdb ./program
# (gdb) bt
# → #0  process_order (order=0x7fffffff) at orders.c:42
#   #1  main () at main.c:10
```

**Why:** The `-g` flag embeds source file names and line numbers into the binary. Without it, GDB can't map machine instructions to your code.

***

#### ❌ Using `SIGKILL` to Stop a Process Before Checking the Port

**Wrong:**

```bash
kill -9 $(lsof -ti :8080)   # immediate, no cleanup
# Next run: "Address already in use" — socket in TIME_WAIT
```

**Correct:**

```bash
kill -SIGTERM $(ss -tlnp | grep ':8080' | grep -oP 'pid=\K\d+')
sleep 2   # give it time to close gracefully
# Only escalate to SIGKILL if it doesn't die:
kill -9 <pid>
```

***

#### ❌ Interpreting ASan Output Without Reading the Key Line

ASan reports have a lot of text, but the most important line is:

```
==1234==ERROR: AddressSanitizer: heap-use-after-free on address 0x... at pc ...
READ of size 1 at 0x... thread T0
    #0 0x... in main uaf.c:13       ← THIS is the buggy line
    
0x... is located 0 bytes inside of 32-byte region [0x...,0x...)
freed by thread T0 here:
    #0 0x... in __interceptor_free ...
    #1 0x... in main uaf.c:9        ← THIS is where free() was called
```

Read: "Line 13 read memory that was freed at line 9." Find those two lines, understand the lifetime mismatch, fix it.

***

#### ❌ Logging at Wrong Severity Levels

**Wrong:**

```python
logger.critical("Processing item: %s", item_id)  # not critical at all
logger.debug("FATAL ERROR — database connection failed")  # wrong level
```

**Correct:**

| Level      | When to use                                                       |
| ---------- | ----------------------------------------------------------------- |
| `DEBUG`    | Detailed trace for development; disabled in production            |
| `INFO`     | Normal operational messages ("server started", "order processed") |
| `WARNING`  | Something unexpected but handled ("retrying after timeout")       |
| `ERROR`    | Something failed that shouldn't have; service continues           |
| `CRITICAL` | System cannot continue; immediate attention required              |

***

### Exercises

#### Beginner Exercises

1. **Add logging to a script:** Take any script you've written and replace all `print()` statements with proper `logging` calls at appropriate severity levels. Set the log level to `WARNING` and verify the `DEBUG`/`INFO` messages disappear.
2. **Time and interpret:** Run `time sleep 2`. Explain why `real ≈ 2s` but `user ≈ 0s` and `sys ≈ 0s`. Now run `time find / -name "*.py" 2>/dev/null`. What does the `sys` time represent here?
3. **Monitor live processes:** Install and run `htop`. Sort by CPU with `F6`. Open a second terminal and run `stress -c 2` (or `yes > /dev/null &`). Observe in `htop`. Kill the processes from within `htop` using `F9`.
4. **strace a simple command:** Run `strace ls 2>&1 | tail -20`. Identify the `write` syscall that produces the output. Now run `strace -c ls 2>&1` for a summary. Which syscall is called most often?
5. **Find a process by port:** Start a server with `python -m http.server 4444`. In another terminal, run `ss -tlnp | grep 4444` to find its PID. Kill it with `kill <PID>`. Verify the port is free.

***

#### Intermediate Exercises

6. **Debug with gdb:** Implement the merge sort from the exercises section in C. Compile with `gcc -g -O0`. Run with the test vector. If it produces incorrect output, use GDB: set a breakpoint in `merge()`, step through, and inspect `left[i]` vs `right[j]` to find the wrong index being used.
7. **AddressSanitizer walkthrough:** Save the `uaf.c` code from the exercises section. Compile and run without sanitizers — it probably works silently. Now compile with `-fsanitize=address -g` and run again. Read the full ASan output. Identify: (a) which line triggers the error, (b) which line freed the memory, (c) what the fix is.
8. **Benchmark fd vs find:**

   ```bash
   # Install fd if needed: brew install fd / apt install fd-find
   hyperfine --warmup 3 \
     "find /usr -name '*.so' 2>/dev/null" \
     "fd -e so . /usr"
   ```

   How much faster is `fd`? Hypothesize why from what you know about each tool.
9. **Profiling Python:** Write a Python function that does something computationally wasteful (e.g., repeated string concatenation in a loop, or a naive recursive Fibonacci). Profile it with `python -m cProfile -s cumulative`. Identify the hot function. Optimize it and measure the improvement with `hyperfine`.
10. **Read a flame graph:** Browse to [speedscope.app](https://speedscope.app/) and load a `perf.script` file (or use one of their demo profiles). Identify the widest bar at the top of the flame. What does that function do? Is it something you could optimize?

***

#### Advanced Challenge

11. **rr reverse debugging:** Compile the `corruption.c` program from the exercises section:

    ```bash
    gcc -g corruption.c -o corruption
    ```

    Run it and observe that `students[1].id` becomes corrupted. Now:

    * `rr record ./corruption`
    * `rr replay`
    * Run to the error output with `c`
    * Set a watchpoint: `watch students[1].id`
    * `reverse-continue` to find the exact line that corrupted it

    Explain why the off-by-one in `curve_scores` corrupts a completely different struct field.
12. **ThreadSanitizer on real code:** Write a small multithreaded C program that has a genuine data race (two threads incrementing a shared counter without a mutex). Compile with `-fsanitize=thread -g`. Run it — TSan reports the race. Add a mutex and verify TSan no longer reports any race.
13. **Memory leak hunt with Valgrind Massif:** Write a C program that allocates memory in a loop but only frees it every 10 iterations (simulating a leak pattern). Run with `valgrind --tool=massif`. Use `ms_print` to visualize heap growth over time. Confirm the sawtooth-with-drift pattern that indicates a real leak vs. the pure sawtooth of a correctly managed pool.
14. **Build a benchmarking pipeline:** For two implementations of the same algorithm (e.g., bubble sort vs quicksort in Python), write a benchmark script using `hyperfine --parameter-scan n 100 10000` to measure performance across a range of input sizes. Plot the results (with matplotlib or gnuplot) to visualize the algorithmic complexity difference.

***

### Summary

| Topic                    | Core Idea                                                               |
| ------------------------ | ----------------------------------------------------------------------- |
| **Printf debugging**     | Fast, universal — but transient. Convert to logs when done.             |
| **Logging**              | Printf with severity, structure, and persistence. Leave it in the code. |
| **Interactive debugger** | Pause, step, inspect — use when you don't know what to print            |
| **rr**                   | Record deterministically, replay backwards — for corruption and races   |
| **strace**               | OS-level syscall tracing — what files, network, processes               |
| **bpftrace/eBPF**        | Kernel-level tracing with aggregation — production-safe                 |
| **AddressSanitizer**     | Compile-time instrumentation for memory bugs — fast, requires recompile |
| **Valgrind**             | No-recompile memory/profiling — slower but universal                    |
| **`time`**               | Distinguish real (waiting) vs user (CPU) vs sys (kernel) time           |
| **`perf`**               | Sampling profiler for CPU hot spots — use with flame graphs             |
| **Flame graphs**         | Width = time. Wide bar = hot spot. Zoom to find the root cause.         |
| **`hyperfine`**          | Statistically sound benchmarking for CLI programs                       |
| **LLMs for debugging**   | Explaining cryptic errors, cross-language bugs, stack trace analysis    |

#### Most Important Commands from This Lecture

```
gdb -g prog             – interactive debugger (requires -g at compile)
rr record / rr replay   – deterministic record-replay debugging
strace -e trace=file    – trace syscalls
perf stat / record      – CPU performance counters and profiling
valgrind --leak-check   – memory error detection without recompile
gcc -fsanitize=address  – compile with AddressSanitizer
hyperfine 'cmd1' 'cmd2' – benchmark comparison
ss -tlnp                – network connections with process info
time cmd                – real / user / sys time breakdown
htop / btop             – real-time resource monitoring
```

#### What's Next

In **Lecture 5 – Version Control and Git**, you'll learn how to track, navigate, and collaborate on code history — the single most important tool in a software developer's daily workflow.

***

*Source:* [*MIT Missing Semester – Debugging and Profiling*](https://missing.csail.mit.edu/2026/debugging-profiling/) *Licensed under* [*CC BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://shankar-lab.gitbook.io/mylearning/the-missing-semester-of-your-cs-education/lecture-4-debugging-and-profiling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
