# Lecture 9 – Code Quality

### Overview

Writing code that works is the baseline. Writing code that *consistently* works — across contributors, operating systems, Python versions, and the passage of time — requires infrastructure. Code quality tools automate the enforcement of correctness and consistency so that human reviewers can focus on logic and design, not style debates and avoidable bugs.

This lecture covers the full quality stack: **formatters** (surface consistency), **linters** (static analysis for deeper issues), **testing** (behavioral verification), **pre-commit hooks** (local enforcement before commits), **continuous integration** (automated enforcement on every push), and **command runners** (ergonomic developer experience across all of the above). It also covers **regular expressions** — a cross-cutting tool used in CI pattern matching, test selection, search-and-replace, and validation.

#### Key Takeaways

* **Formatters** eliminate style debates by making code layout non-negotiable. Use `ruff format` (Python), `prettier` (JS/TS), or `gofmt` (Go). Check in your config file. Enable format-on-save in your editor.
* **Linters** go deeper than formatters: they flag antipatterns, potential bugs, and style issues through static analysis. `ruff check` (Python), `eslint` (JS), `clippy` (Rust). Rules are configurable; false-positive rules can be suppressed per-line.
* **`semgrep`** is a language-agnostic semantic grep that works at the AST level. Write custom rules to enforce project-specific patterns (e.g., ban `subprocess.Popen(..., shell=True)`).
* **Testing** comes in layers: unit (individual functions), integration (module interactions), functional (end-to-end). Regression tests lock in fixed bugs. Property-based tests (Hypothesis) explore edge cases automatically. Mock external dependencies.
* **Code coverage** measures which lines execute during tests. Useful for finding untested paths. Don't over-index on the metric — high coverage doesn't mean high-quality tests.
* **Pre-commit hooks** run formatters and linters automatically before every `git commit`. The `pre-commit` framework makes this portable and shareable across a team.
* **Continuous integration** (GitHub Actions) runs your quality tools on every push and pull request — independent of developer machines, across a matrix of OS/language versions. CI runs in check-only mode, not fix mode.
* **Command runners** (`just`) give developers single-word shortcuts to complex commands: `just lint`, `just test`, `just format`.
* **Regular expressions** match patterns in strings. Core syntax: `.`, `*`, `+`, `?`, `{N}`, `[...]`, `\d`, `\w`, `^`, `$`, `(...)`. Capture groups enable extraction and search-and-replace. Regex is powerful but limited — HTML and recursive grammars require proper parsers.

***

### Core Concepts

#### Formatting: Eliminating Style Debates Permanently

A **code formatter** automatically rewrites your code's surface syntax to comply with a fixed style. This covers things like: quote consistency (`'` vs `"`), spaces around operators (`x+y` → `x + y`), import ordering, line length, trailing commas, bracket placement.

The point is not that any particular style is better. It's that having a *single enforced style* means:

* No diff noise from reformatting across contributors
* No code review time spent on style comments
* No per-engineer style configuration to maintain
* New contributors automatically produce consistent-looking code

**Python — `ruff format` (recommended) or `black`:**

```bash
# Format all files in the project
ruff format .

# Check without modifying (for CI)
ruff format --check .

# Format a single file
ruff format src/main.py
```

`black` and `ruff format` are intentionally opinionated with minimal configuration — the design goal is to eliminate bikeshedding entirely. You cannot configure indent size or quote style; the formatter makes that decision and you accept it.

**Other languages:**

```bash
# JavaScript / TypeScript
prettier --write .
prettier --check .    # check-only mode for CI

# Go
gofmt -w .            # writes in-place
gofmt -l .            # lists files that need formatting

# Rust
cargo fmt
cargo fmt --check     # CI mode
```

**Editor integration:** Configure your editor to format on save — this means you never manually run the formatter and your code is always compliant before you even commit.

```json
// VS Code settings.json
{
  "editor.formatOnSave": true,
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff"
  }
}
```

**EditorConfig** (`.editorconfig`) communicates project-level settings (indent size, line endings, charset) to any editor without per-editor configuration files:

```ini
# .editorconfig
root = true

[*]
indent_style = space
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true

[*.js]
indent_size = 2

[*.md]
trim_trailing_whitespace = false
```

***

#### Linting: Static Analysis for Deeper Issues

A **linter** performs static analysis — it reads your code without running it and identifies patterns that are likely bugs, antipatterns, or violations of style conventions that go beyond surface formatting.

Where a formatter asks "does this look right?", a linter asks "does this *mean* what you think it means?"

**Examples of what linters catch:**

```python
# Ruff rule SIM102 — unnecessarily nested if
if condition_a:
    if condition_b:      # ← flagged: combine into a single if
        do_thing()

# Better:
if condition_a and condition_b:
    do_thing()
```

```python
# Ruff rule B006 — mutable default argument (classic Python footgun)
def add_item(item, lst=[]):   # ← flagged: mutable default persists across calls
    lst.append(item)
    return lst

# Better:
def add_item(item, lst=None):
    if lst is None:
        lst = []
    lst.append(item)
    return lst
```

```python
# Ruff rule E711 — comparison to None using == instead of is
if result == None:    # ← flagged
    ...
if result is None:    # correct
    ...
```

**Python — `ruff check` (fast, comprehensive):**

```bash
# Check for issues
ruff check .

# Auto-fix fixable issues
ruff check --fix .

# Check a specific file
ruff check src/main.py

# Show explanation for a rule
ruff rule SIM102

# List all enabled rules
ruff check --show-settings
```

**Configuring rules in `pyproject.toml`:**

```toml
[tool.ruff.lint]
# Enable specific rule sets
select = ["E", "F", "B", "SIM", "I"]
# E = pycodestyle errors
# F = Pyflakes (unused imports, undefined names)
# B = flake8-bugbear (likely bugs and antipatterns)
# SIM = flake8-simplify
# I = isort (import ordering)

# Disable specific rules that produce too many false positives
ignore = ["E501"]   # line too long (handled by formatter instead)

# Per-file exceptions
[tool.ruff.lint.per-file-ignores]
"tests/**" = ["S101"]   # allow assert statements in tests
```

**Suppressing a specific line:**

```python
result = eval(user_input)  # noqa: S307 — sandboxed environment, input is trusted
```

***

#### `semgrep`: Semantic Pattern Matching

`semgrep` (semantic grep) works at the **AST level** rather than the character level. This means it understands code structure, not just text. It supports many languages and lets you write custom rules for project-specific patterns.

```bash
# Find dangerous shell=True usage in Python
semgrep -l python -e "subprocess.Popen(..., shell=True, ...)"

# Why this is better than grep:
# grep would miss: subprocess.Popen(cmd, shell = True)  (spaces around =)
# grep would miss: subprocess.Popen(
#                      cmd,
#                      shell=True       (multi-line)
#                  )
# semgrep catches all of these because it matches the AST, not text
```

**Writing a custom rule (YAML format):**

```yaml
# .semgrep/no-shell-true.yaml
rules:
  - id: no-subprocess-shell-true
    patterns:
      - pattern: subprocess.Popen(..., shell=True, ...)
    message: "shell=True in Popen is a security risk. Pass a list instead."
    languages: [python]
    severity: ERROR
```

```bash
# Run all rules in .semgrep/
semgrep --config .semgrep/ .

# Run the community ruleset for Python security
semgrep --config "p/python"
```

***

#### Testing: Behavioral Verification at Multiple Levels

Testing is writing code that exercises your code and raises errors when it misbehaves. Tests differ from linting in that they verify *runtime behavior*, not static properties.

**Test granularity levels:**

| Level                | What it tests                 | Scope         | Speed     |
| -------------------- | ----------------------------- | ------------- | --------- |
| **Unit**             | Individual functions/methods  | Isolated      | Very fast |
| **Integration**      | Module-to-module interactions | Wider         | Medium    |
| **Functional / E2E** | Full user-visible scenarios   | Entire system | Slow      |

**Python — `pytest`:**

```python
# tests/test_extractor.py

def extract_urls(text: str) -> list[str]:
    """Extract http/https URLs from text."""
    import re
    return re.findall(r'https?://\S+', text)

# Unit tests
def test_extracts_single_url():
    assert extract_urls("Visit https://example.com today") == ["https://example.com"]

def test_empty_string_returns_empty_list():
    assert extract_urls("") == []

def test_no_urls_returns_empty_list():
    assert extract_urls("no links here") == []

def test_extracts_multiple_urls():
    result = extract_urls("https://a.com and https://b.com")
    assert result == ["https://a.com", "https://b.com"]

# Parametrized tests — run the same test body with multiple inputs
import pytest

@pytest.mark.parametrize("text,expected", [
    ("https://example.com", ["https://example.com"]),
    ("", []),
    ("no links", []),
    ("http://a.com https://b.com", ["http://a.com", "https://b.com"]),
])
def test_extract_urls_parametrized(text, expected):
    assert extract_urls(text) == expected
```

```bash
# Run all tests
pytest

# Run with verbose output
pytest -v

# Run a specific file
pytest tests/test_extractor.py

# Run tests matching a pattern
pytest -k "url"          # runs test_extracts_single_url, test_extracts_multiple_urls, etc.
pytest -k "not slow"     # excludes tests marked @pytest.mark.slow

# Stop on first failure
pytest -x

# Show locals on failure
pytest -l
```

**Regression tests** — lock in fixed bugs:

```python
def test_regression_issue_392_null_user_crash():
    """
    Regression test for issue #392.
    Previously, session expiry mid-request caused current_user to be None,
    which triggered an AttributeError rather than a redirect.
    """
    response = client.get("/profile", headers={"X-Session": "expired-token"})
    assert response.status_code == 302
    assert "/login" in response.headers["Location"]
    # Must NOT be 500
    assert response.status_code != 500
```

**Property-based testing with `hypothesis`** — the library generates random inputs to find edge cases you wouldn't have thought of:

```python
from hypothesis import given, strategies as st

@given(st.text())
def test_extract_urls_never_raises(text):
    """extract_urls should handle any string input without raising an exception."""
    result = extract_urls(text)
    assert isinstance(result, list)

@given(st.lists(st.text(min_size=1)))
def test_roundtrip_encode_decode(strings):
    """Encoding then decoding should return the original list."""
    encoded = encode(strings)
    decoded = decode(encoded)
    assert decoded == strings
```

**Mocking external dependencies:**

```python
from unittest.mock import patch, MagicMock

def test_fetch_user_profile_handles_api_timeout():
    """Verify graceful handling when the external profile API times out."""
    with patch("myapp.client.requests.get") as mock_get:
        mock_get.side_effect = requests.Timeout()

        result = fetch_user_profile(user_id=42)

    assert result is None   # or whatever the expected fallback behavior is
    mock_get.assert_called_once()
```

***

#### Code Coverage

Code coverage tracks which lines execute when your tests run. Lines that never execute are untested — any bug in them will be invisible to your test suite.

```bash
# Run tests with coverage (pytest-cov plugin)
pytest --cov=src --cov-report=term-missing

# Generate an HTML report (visual, line-by-line)
pytest --cov=src --cov-report=html
open htmlcov/index.html   # browse which lines are covered / uncovered

# Coverage configuration in pyproject.toml
[tool.coverage.run]
source = ["src"]
omit = ["tests/*", "*/migrations/*"]

[tool.coverage.report]
fail_under = 80    # fail CI if coverage drops below 80%
show_missing = true
```

**What coverage tells you and what it doesn't:**

```python
# This function has 100% line coverage — every line executes in tests.
# But the test only covers the happy path:
def divide(a, b):
    if b == 0:       # ← line covered (test calls divide(10, 2))
        return None  # ← line NOT covered — zero case never tested
    return a / b     # ← line covered
```

Coverage shows you *where* you're not testing. It doesn't guarantee that the tests you do have are *good*. A test that calls a function but never asserts anything meaningful gives 100% coverage and zero confidence.

> Don't over-index on the coverage number. 80% meaningful tests > 100% coverage with trivial assertions.

***

#### Pre-commit Hooks

A **Git pre-commit hook** runs automatically before every `git commit`. If it exits with a non-zero code, the commit is blocked. This enforces quality checks locally — catching issues before they ever reach CI or code review.

The [`pre-commit`](https://pre-commit.com/) framework makes hooks portable and shareable across a team via a single config file:

```bash
# Install pre-commit
pip install pre-commit

# Install the hooks defined in .pre-commit-config.yaml
pre-commit install

# Run all hooks manually against all files (useful for initial setup)
pre-commit run --all-files
```

**`.pre-commit-config.yaml`:**

```yaml
repos:
  # Ruff: lint and format (Python)
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.0
    hooks:
      - id: ruff              # linter
        args: [--fix]         # auto-fix what can be auto-fixed
      - id: ruff-format       # formatter

  # General file hygiene
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-json
      - id: check-merge-conflict
      - id: detect-private-key   # prevent accidentally committing secrets

  # Type checking
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.13.0
    hooks:
      - id: mypy
        additional_dependencies: [types-requests]
```

When a developer commits, the hooks run automatically:

```
$ git commit -m "Add URL extractor"
ruff.....................................................................Passed
ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook
  src/extractor.py
```

If ruff-format modified files, the commit is blocked. The developer stages the reformatted files and commits again.

***

#### Continuous Integration

**CI** moves quality enforcement out of local developer machines and into a neutral, reproducible server environment that runs on every push and pull request.

This catches:

* Issues that only appear on certain OS/Python version combinations
* Regressions introduced by a commit that "worked locally"
* Style/lint violations that slipped past pre-commit
* Breaking changes in external dependencies (when run on a schedule)

**GitHub Actions — basic CI workflow:**

```yaml
# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install uv
        run: pip install uv

      - name: Install dependencies
        run: uv pip install --system -e ".[dev]"

      - name: Check formatting
        run: ruff format --check .   # ← check-only, not fix

      - name: Lint
        run: ruff check .

      - name: Type check
        run: mypy src/

      - name: Run tests
        run: pytest --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: ./coverage.xml
```

**Matrix builds** — test across multiple Python versions and operating systems in parallel:

```yaml
jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python-version: ["3.11", "3.12", "3.13"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - run: pip install -e ".[dev]"
      - run: pytest
```

This produces 9 parallel jobs (3 OS × 3 Python versions), each running your test suite independently.

**CI is check-only:** CI tools run in `--check` mode, never `--fix` mode. The CI job should fail when code doesn't meet the quality bar, not silently reformat it. Developers fix locally and push again.

**Continuous deployment:** CI can go beyond checking — it can deploy. GitHub Pages auto-builds and deploys this course's website on every `git push`. Docker images, compiled binaries, and Python wheels can be built and published to registries from CI on every tagged release.

**Status badges** (added to your README to show live CI state):

```markdown
[![CI](https://github.com/user/repo/actions/workflows/ci.yml/badge.svg)](https://github.com/user/repo/actions/workflows/ci.yml)
[![Coverage](https://codecov.io/gh/user/repo/badge.svg)](https://codecov.io/gh/user/repo)
```

***

#### Command Runners: Ergonomic Developer Experience

As your quality infrastructure grows, commands get long:

```bash
uv run ruff check --fix .
uv run ruff format .
uv run mypy src/ --strict
uv run pytest --cov=src --cov-report=term-missing -x
```

A **command runner** (`just`) creates short, memorable aliases for these commands, defined in a `Justfile` at the project root:

```makefile
# Justfile
default:
    just --list

# Format all code
format:
    ruff format .

# Lint with auto-fix
lint:
    ruff check --fix .

# Type check
typecheck:
    mypy src/ --strict

# Run tests with coverage
test:
    pytest --cov=src --cov-report=term-missing -x

# Run all quality checks (check-only, no auto-fix — mirrors CI)
check:
    ruff format --check .
    ruff check .
    mypy src/ --strict
    pytest

# Set up dev environment from scratch
setup:
    uv venv
    uv pip install -e ".[dev]"
    pre-commit install
```

```bash
just lint       # runs: ruff check --fix .
just test       # runs: pytest --cov=src --cov-report=term-missing -x
just check      # runs the full check suite (same as CI)
just setup      # onboards a new developer in one command
```

For **Python projects with `hatch`** or **Node.js with npm**, the command runner is built into the project config:

```toml
# pyproject.toml (hatch)
[tool.hatch.envs.default.scripts]
lint = "ruff check --fix ."
test = "pytest --cov=src"
check = ["ruff format --check .", "ruff check .", "mypy src/", "pytest"]
```

```json
// package.json (npm)
{
  "scripts": {
    "lint": "eslint src/",
    "format": "prettier --write .",
    "test": "jest --coverage",
    "check": "prettier --check . && eslint src/ && jest"
  }
}
```

***

#### Regular Expressions

A **regular expression** (regex) is a pattern language for describing sets of strings. Regex appears throughout the quality toolchain: CI step filters, test subset selection (`pytest -k "pattern"`), search-and-replace in editors, semgrep custom rules, and validation logic in code.

**The core syntax (Python flavor — `re` module):**

| Pattern  | Matches                           | Example                                   |
| -------- | --------------------------------- | ----------------------------------------- |
| `abc`    | Literal string "abc"              | `abc` matches `"abc"`                     |
| `.`      | Any single character              | `a.c` matches `"abc"`, `"a2c"`            |
| `[abc]`  | Any one of a, b, c                | `[aeiou]` matches any vowel               |
| `[^abc]` | Any character except a, b, c      | `[^0-9]` matches non-digit                |
| `[a-z]`  | Any character in range            | `[a-f]` matches "c" but not "q"           |
| `a\|b`   | Either a or b                     | `cat\|dog` matches "cat" or "dog"         |
| `\d`     | Any digit (0–9)                   | `\d+` matches "123"                       |
| `\w`     | Any word character \[a-zA-Z0-9\_] | `\w+` matches "hello\_world"              |
| `\b`     | Word boundary                     | `\bcat\b` matches "cat" but not "catch"   |
| `^`      | Start of line                     | `^import` matches import at line start    |
| `$`      | End of line                       | `\.py$` matches filenames ending in .py   |
| `?`      | Zero or one                       | `colou?r` matches "color" or "colour"     |
| `*`      | Zero or more                      | `a*` matches "", "a", "aaa"               |
| `+`      | One or more                       | `\d+` matches "1", "99" but not ""        |
| `{N}`    | Exactly N                         | `\d{4}` matches exactly 4 digits          |
| `{N,M}`  | Between N and M                   | `\d{2,4}` matches 2, 3, or 4 digits       |
| `(...)`  | Capture group                     | `(\d{4})-(\d{2})` captures year and month |
| `\.`     | Literal dot                       | `3\.14` matches "3.14" not "3X14"         |

**Real examples:**

```python
import re

# Match a date in YYYY-MM-DD format
re.match(r'\d{4}-\d{2}-\d{2}', '2026-01-14')     # ✅
re.match(r'\d{4}-\d{2}-\d{2}', '2026-01-99')     # ✅ (regex doesn't validate values)

# Extract month from a date using a capture group
re.match(r'\d{4}-(\d{2})-\d{2}', '2026-01-14').group(1)   # → '01'

# Find all URLs in text
re.findall(r'https?://\S+', 'Visit https://example.com and http://other.org')
# → ['https://example.com', 'http://other.org']

# Validate email (basic — just structure, not correctness)
re.match(r'.+@.+\..+', 'user@example.com')  # ✅
re.match(r'.+@.+\..+', 'notanemail')         # ✗

# Extract log fields from Nginx access log
pattern = r'(\d+\.\d+\.\d+\.\d+) .* "(\w+) (/\S*) HTTP/[\d.]+" (\d+)'
m = re.match(pattern, '169.254.1.1 - - [09/Jan/2026] "GET /feed.xml HTTP/2.0" 200 2995')
# m.group(1) → '169.254.1.1'  (IP)
# m.group(2) → 'GET'          (method)
# m.group(3) → '/feed.xml'    (path)
# m.group(4) → '200'          (status)
```

**Greedy vs. non-greedy quantifiers:**

```python
text = '{"name": "Alice", "city": "NYC"}'

# Greedy — matches as much as possible
re.search(r'"name": "(.+)"', text).group(1)
# → 'Alice", "city": "NYC'   ← grabbed too much!

# Non-greedy (add ?) — matches as little as possible
re.search(r'"name": "(.+?)"', text).group(1)
# → 'Alice'   ← correct
```

**Capture groups in editor search-and-replace:**

```
# Convert Python 2 print statements to Python 3 function calls
# Find:    print (.+)
# Replace: print($1)

# VS Code uses $1, $2 for groups
# Vim uses \1, \2
```

**Regex limitations — when not to use regex:**

Regex is powerful but cannot express all patterns. HTML, XML, JSON, and most programming languages are not regular languages — they cannot be correctly parsed with regex. Use the right tool:

```python
# ❌ Don't parse HTML with regex
title = re.search(r'<title>(.*?)</title>', html).group(1)  # breaks on nested tags

# ✅ Use a proper HTML parser
from bs4 import BeautifulSoup
title = BeautifulSoup(html, 'html.parser').title.string

# ❌ Don't parse JSON with regex
name = re.search(r'"name": "(.+?)"', json_str).group(1)   # breaks on escaped quotes

# ✅ Use the JSON parser
import json
name = json.loads(json_str)["name"]
```

For sophisticated custom parsing needs beyond regex, look at PEG parsers like [`pyparsing`](https://github.com/pyparsing/pyparsing).

***

### Mental Models

#### The Quality Stack: Local to Remote

Quality enforcement happens at multiple layers. Each layer catches different things and has different latency (time to feedback):

```
Fastest feedback                                        Slowest feedback
────────────────────────────────────────────────────────────────────────
Editor       Pre-commit     Local test    CI             Production
(real-time)  (on commit)    (on demand)   (on push/PR)   (post-deploy)

Format-on-    Format        pytest        Format --check  Monitoring
save          Lint          Coverage      Lint            Alerting
LSP errors    Type check    Integration   Type check      Error tracking
Inline lint                 tests         Full test
                                          matrix
```

**The principle:** Push enforcement as far left (toward the editor) as possible. A formatting error caught on save costs nothing. The same error caught in CI costs a push cycle. The same error caught in code review costs someone's time. In production, there's no catching it — there's only damage control.

***

#### CI as a Neutral Witness

Developer machines are heterogeneous — different OS versions, Python versions, locally installed tools, environment variables, cached state. CI is none of these things: it's a fresh, reproducible environment that knows nothing about your machine.

```
Developer A (macOS, Python 3.13):     Tests pass ✅
Developer B (Ubuntu, Python 3.11):    Tests pass ✅
CI (Ubuntu, Python 3.11 + 3.12 + 3.13): Fails on Python 3.11 ❌
  → reveals a 3.11 incompatibility that A and B both missed
```

CI doesn't trust your local environment. It rebuilds from scratch every run. That's exactly what makes it trustworthy.

***

#### Regex: Matching Against a Grammar

Think of a regex as a compact grammar that describes exactly the set of strings you want to match. Building one is an iterative process:

```
Goal: match ISO dates, not reject valid ones, not accept invalid ones

Start:      \d{4}-\d{2}-\d{2}
            → matches 2026-01-14 ✅
            → matches 2026-01-99 ✅ (accepts invalid day — too permissive)
            → matches 2026-1-4  ✗ (rejects single-digit month — too strict? depends)

Refine for day 01-31:
            \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
            → rejects month 13, day 32 ✅
            → still accepts Feb 31 ✗ (calendar validation needs code, not regex)

Lesson: regex validates *structure*, not *semantics*.
        For semantic validation, use code.
```

Use [regex101.com](https://regex101.com/) to build and debug interactively. LLMs are effective at generating regex patterns from natural-language descriptions — try describing what you want in plain English and let the model produce a starting pattern.

***

### Commands and Syntax

#### Python Quality Toolchain

```bash
# ── Formatting (ruff) ──────────────────────────────────────────────
ruff format .                      # format all files
ruff format --check .              # check-only (for CI)
ruff format src/main.py            # single file

# ── Linting (ruff) ─────────────────────────────────────────────────
ruff check .                       # check for issues
ruff check --fix .                 # auto-fix what can be fixed
ruff check --select E,F,B .        # only these rule sets
ruff rule SIM102                   # show documentation for a rule

# ── Type checking (mypy) ───────────────────────────────────────────
mypy src/                          # type-check the src directory
mypy src/ --strict                 # strict mode (all optional checks)
mypy src/ --ignore-missing-imports # suppress missing stub warnings

# ── Testing (pytest) ───────────────────────────────────────────────
pytest                             # run all tests
pytest -v                          # verbose: show each test name
pytest -x                          # stop on first failure
pytest -k "url"                    # run tests matching pattern
pytest tests/test_extractor.py     # single file
pytest --cov=src                   # coverage (requires pytest-cov)
pytest --cov=src --cov-report=html # generate HTML coverage report

# ── Pre-commit ─────────────────────────────────────────────────────
pip install pre-commit
pre-commit install                 # install hooks in .git/hooks/
pre-commit run --all-files         # run all hooks on every file
pre-commit run ruff                # run specific hook only
pre-commit autoupdate              # update hook versions

# ── semgrep ────────────────────────────────────────────────────────
semgrep -l python -e "subprocess.Popen(..., shell=True, ...)"
semgrep --config "p/python" .      # community security ruleset
semgrep --config .semgrep/ .       # custom local rules
```

***

#### Regex in Python (`re` module)

```python
import re

# ── Matching ───────────────────────────────────────────────────────
re.match(pattern, string)      # match at the START of string
re.search(pattern, string)     # match ANYWHERE in string
re.fullmatch(pattern, string)  # entire string must match

# ── Finding all matches ────────────────────────────────────────────
re.findall(pattern, string)    # list of all non-overlapping matches
re.finditer(pattern, string)   # iterator of match objects

# ── Substitution ───────────────────────────────────────────────────
re.sub(pattern, replacement, string)
re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\3/\2/\1', '2026-01-14')
# → '14/01/2026'  (reorder date components using capture groups)

# ── Splitting ──────────────────────────────────────────────────────
re.split(r'\s+', 'hello   world  foo')
# → ['hello', 'world', 'foo']

# ── Flags ──────────────────────────────────────────────────────────
re.search(pattern, string, re.IGNORECASE)    # case-insensitive
re.search(pattern, string, re.MULTILINE)     # ^ and $ match line boundaries
re.search(pattern, string, re.DOTALL)        # . matches newlines too

# ── Compiled patterns (for repeated use) ───────────────────────────
date_re = re.compile(r'\d{4}-\d{2}-\d{2}')
date_re.findall(large_document)    # faster than re.findall() in a loop
```

***

#### Regex in the Command Line and Editor

```bash
# grep with regex
grep -E 'subprocess\.Popen.*shell=True' **/*.py   # Extended regex
grep -rn 'TODO|FIXME|HACK' src/                   # Find all markers

# ag (the silver searcher) — fast, regex-aware
ag 'import .* as .*' src/    # find all aliased imports in Python

# sed for inline substitution
sed -i 's/foo/bar/g' file.txt                         # replace all foo → bar
sed -i 's/\(print\) \(.*\)/\1(\2)/g' legacy.py       # print → print()

# Testing a pattern against input
echo "2026-01-14" | grep -E '^\d{4}-\d{2}-\d{2}$'
```

```
VS Code Find (Ctrl+H) — regex search-and-replace:
  Find:    ^(- .*)$        (lines starting with "- ")
  Replace: * $1            (prefix with "* ")
  Uses $1, $2 for capture groups

Vim substitution:
  :%s/\(\d\{4\}\)-\(\d\{2\}\)/\2\/\1/g   (swap year-month order)
  Uses \1, \2 for capture groups
```

***

### System Diagrams

#### The Full Quality Pipeline

```mermaid
graph LR
    CODE["Code\n(you write it)"]

    subgraph "Editor (real-time)"
        FOS["Format on save\n(ruff format)"]
        LSP["LSP inline errors\n(type errors, undefined names)"]
    end

    subgraph "Pre-commit (on git commit)"
        PC_FMT["ruff format --check"]
        PC_LINT["ruff check --fix"]
        PC_SEC["detect-private-key\ncheck-yaml / check-json"]
    end

    subgraph "Local (on demand)"
        TEST["pytest\n--cov=src"]
        MYPY["mypy src/"]
    end

    subgraph "CI — GitHub Actions (on push / PR)"
        CI_FMT["ruff format --check\n(check-only)"]
        CI_LINT["ruff check"]
        CI_TYPE["mypy src/"]
        CI_TEST["pytest matrix\n(3 OS × 3 Python versions)"]
        COV["codecov upload"]
    end

    CODE --> FOS --> LSP
    CODE -->|"git commit"| PC_FMT --> PC_LINT --> PC_SEC
    CODE -->|"just test"| TEST
    CODE -->|"just typecheck"| MYPY
    CODE -->|"git push"| CI_FMT --> CI_LINT --> CI_TYPE --> CI_TEST --> COV

    style COV fill:#2d4a22,color:#fff
    style CI_TEST fill:#1a3a5c,color:#fff
```

***

#### Linting vs. Formatting vs. Type Checking

```mermaid
graph TB
    subgraph "Formatter (surface syntax)"
        F1["Quote style: ' vs \""]
        F2["Spaces around operators"]
        F3["Import ordering"]
        F4["Line length"]
        F5["Trailing commas"]
    end

    subgraph "Linter (static analysis)"]
        L1["Antipatterns\n(mutable default args)"]
        L2["Likely bugs\n(== None instead of is None)"]
        L3["Dead code\n(unused imports/variables)"]
        L4["Complexity\n(nested ifs that can be collapsed)"]
        L5["Security patterns\n(shell=True, eval())"]
    end

    subgraph "Type checker (semantic correctness)"
        T1["Incorrect argument types"]
        T2["None not handled"]
        T3["Missing return types"]
        T4["Incompatible assignments"]
        T5["Missing attributes"]
    end

    DEPTH["Increasing depth\nof analysis →"]
    F1 -.-> L1 -.-> T1

    style DEPTH fill:transparent,color:#888
```

***

#### CI Matrix Build

```mermaid
graph TD
    PUSH["git push to main / PR opened"]

    subgraph "Matrix: 3 OS × 3 Python = 9 parallel jobs"
        J1["ubuntu / py3.11"]
        J2["ubuntu / py3.12"]
        J3["ubuntu / py3.13"]
        J4["macos / py3.11"]
        J5["macos / py3.12"]
        J6["macos / py3.13"]
        J7["windows / py3.11"]
        J8["windows / py3.12"]
        J9["windows / py3.13"]
    end

    RESULT["All 9 pass → ✅ PR can merge\nAny fail → ❌ PR blocked"]

    PUSH --> J1 & J2 & J3 & J4 & J5 & J6 & J7 & J8 & J9
    J1 & J2 & J3 & J4 & J5 & J6 & J7 & J8 & J9 --> RESULT

    style RESULT fill:#2d4a22,color:#fff
```

***

#### Regex Pattern Building Process

```mermaid
graph TD
    GOAL["Define what strings you want to match\n(and critically: NOT match)"]
    DRAFT["Write initial pattern\non regex101.com"]
    TEST["Test against:\n✅ valid examples (should match)\n❌ invalid examples (should not)"]
    FAIL["Pattern too greedy?\nToo strict?\nMissing edge case?"]
    REFINE["Refine the pattern\n(add anchors, adjust quantifiers,\nuse non-greedy ?, add groups)"]
    DONE["Pattern handles all test cases ✅"]
    LIMIT["Pattern getting very complex?\nConsider: is this the right tool?\nMaybe use a proper parser."]

    GOAL --> DRAFT --> TEST
    TEST -->|"All pass"| DONE
    TEST -->|"Some fail"| FAIL --> REFINE --> TEST
    DONE -->|"Very complex"| LIMIT

    style DONE fill:#2d4a22,color:#fff
    style LIMIT fill:#4a2d00,color:#fff
```

***

### Pipeline Examples

#### Bootstrapping a Complete Quality Stack for a Python Project

```bash
# 1. Install tools
uv pip install ruff mypy pytest pytest-cov pre-commit

# 2. Configure ruff in pyproject.toml
cat >> pyproject.toml << 'EOF'
[tool.ruff]
line-length = 88

[tool.ruff.lint]
select = ["E", "F", "B", "SIM", "I"]
ignore = ["E501"]

[tool.mypy]
strict = true
ignore_missing_imports = true

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--cov=src --cov-report=term-missing"

[tool.coverage.report]
fail_under = 80
EOF

# 3. Configure pre-commit
cat > .pre-commit-config.yaml << 'EOF'
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.0
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-merge-conflict
      - id: detect-private-key
      - id: end-of-file-fixer
EOF
pre-commit install

# 4. Create Justfile
cat > Justfile << 'EOF'
format:
    ruff format .
lint:
    ruff check --fix .
typecheck:
    mypy src/
test:
    pytest
check:
    ruff format --check . && ruff check . && mypy src/ && pytest
EOF

# 5. Bootstrap CI
mkdir -p .github/workflows
# → write ci.yml (see CI section above)

# 6. Verify everything passes from scratch
just check
```

***

#### Fixing All Linter Errors with an Agent (Iterative Loop)

```bash
# Give the agent a verifiable exit condition it can check itself
claude "Fix all ruff linter errors in this project.
        Run 'ruff check .' to see the current errors.
        After each fix, run 'ruff check .' again to verify progress.
        Stop when 'ruff check .' exits with code 0 (no errors).
        Do not suppress errors with noqa comments unless genuinely necessary —
        fix the underlying issue instead."

# Agent loop:
# 1. ruff check . → reads error list
# 2. Edits offending files
# 3. ruff check . → verifies progress
# 4. Repeats until clean exit
```

***

#### Running a Specific Subset of Tests

```bash
# Run only tests related to URL extraction
pytest -k "url"

# Run only fast tests (skip ones marked slow)
pytest -k "not slow"

# Run tests in a specific file
pytest tests/test_extractor.py

# Run with regex in pytest (go test style)
pytest -k "extract and not slow"

# Run a specific test by name
pytest tests/test_extractor.py::test_extracts_multiple_urls

# Run with verbose output + stop on first failure
pytest -vx
```

***

#### Debugging a Failing Pre-commit Hook

```bash
# Hook failed — what happened?
$ git commit -m "Add feature"
ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook

# ruff-format modified the file but the modified version isn't staged.
# Stage the reformatted file and recommit:
git add src/extractor.py
git commit -m "Add feature"
# Now passes ✅

# If a linter error needs manual fixing:
$ git commit -m "Add feature"
ruff.....................................................................Failed
- hook id: ruff
- exit code: 1
src/extractor.py:14:5: B006 Do not use mutable data structures for argument defaults

# Fix the issue manually:
# Change: def add(item, lst=[]):
# To:     def add(item, lst=None):
#           if lst is None: lst = []
git add src/extractor.py
git commit -m "Add feature"
```

***

### Real-World Workflows

#### Improving Test Coverage with an Agent

```bash
# Generate an HTML coverage report to show the agent exactly which lines are uncovered
pytest --cov=src --cov-report=html --cov-report=term-missing
open htmlcov/index.html   # review visually

# Give the agent the coverage output and a verifiable target
claude "The test coverage for src/ is currently 54%. The uncovered lines are shown
        in the attached terminal output. Write additional tests in tests/ to bring
        coverage above 80%. After writing each batch of tests, run:
          pytest --cov=src --cov-report=term-missing
        and use the output to guide where to focus next.
        Stop when coverage exceeds 80%.
        Important: review each generated test to ensure it has meaningful
        assertions — don't just call the function without asserting anything."
```

***

#### Setting Up CI for a New Open Source Project

```bash
# Minimal but complete ci.yml that mirrors what you run locally
mkdir -p .github/workflows

cat > .github/workflows/ci.yml << 'EOF'
name: CI
on: [push, pull_request]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install uv
      - run: uv pip install --system -e ".[dev]"
      - run: ruff format --check .
      - run: ruff check .
      - run: mypy src/
      - run: pytest --cov=src --cov-report=xml
      - uses: codecov/codecov-action@v4
EOF

# Add a status badge to README.md
echo "[![CI](https://github.com/USER/REPO/actions/workflows/ci.yml/badge.svg)](https://github.com/USER/REPO/actions/workflows/ci.yml)" >> README.md

# Test by introducing a deliberate violation
echo "x=1+2" >> src/main.py    # style violation: no spaces
git push
# → CI fails on ruff format --check ✅ (expected)
git checkout src/main.py
```

***

#### Using LLMs to Generate Regex Patterns

```
Prompt:
  Write a Python-style regex pattern that extracts the HTTP method and
  requested path from Nginx access log lines. Here's an example line:

  169.254.1.1 - - [09/Jan/2026:21:28:51 +0000] "GET /feed.xml HTTP/2.0" 200 2995

  The pattern should capture: (1) the HTTP method, (2) the path, (3) the status code.

LLM output:
  r'"(\w+) (/[^\s"]*) HTTP/[\d.]+" (\d{3})'

  group(1) → HTTP method (GET, POST, etc.)
  group(2) → path (/feed.xml)
  group(3) → status code (200)

Verify on regex101.com, then use:
  import re
  pattern = re.compile(r'"(\w+) (/[^\s"]*) HTTP/[\d.]+" (\d{3})')
  for line in log_lines:
      m = pattern.search(line)
      if m:
          method, path, status = m.group(1), m.group(2), m.group(3)
```

***

### Common Mistakes

#### ❌ Running Formatters in CI Instead of `--check`

**Wrong:**

```yaml
# ci.yml
- run: ruff format .    # silently reformats files in the CI container
                         # your repo is unchanged; the reformat is lost
```

**Correct:**

```yaml
- run: ruff format --check .    # exits non-zero if any file needs reformatting
                                  # forces the developer to format locally first
```

**Why:** If CI reformats for you, the reformatted code is inside an ephemeral CI container and then thrown away. Your repository never actually gets the fix. `--check` enforces that the code is already formatted before it arrives.

***

#### ❌ Suppressing Linter Errors Instead of Fixing Them

**Wrong:**

```python
result = eval(user_input)  # noqa: S307
x = 1 + 2  # noqa  ← suppresses ALL rules on this line
```

**Correct:**

```python
# For genuine false positives, suppress the specific rule with a comment explaining why:
value = ast.literal_eval(config_string)  # safe: config is from our own config file,
                                          # not user input

# In pyproject.toml — disable rules that are genuinely wrong for your project:
[tool.ruff.lint]
ignore = ["ANN101"]   # missing type annotation for self — not useful
```

**Why:** Suppressing rules defeats the purpose of having them. A blanket `noqa` that silences all rules on a line is especially harmful — it hides real issues. Use targeted suppressions with a comment explaining the justification.

***

#### ❌ Treating Code Coverage as the Goal

**Wrong:**

```python
def test_coverage():
    """This test gets us to 100% coverage."""
    for func in [add, subtract, multiply, divide]:
        try:
            func(1, 1)
        except Exception:
            pass   # every line executes; no assertion made
```

**Correct:**

```python
def test_divide_two_positive_numbers():
    assert divide(10, 2) == 5.0

def test_divide_by_zero_returns_none():
    assert divide(10, 0) is None

def test_divide_negative_numbers():
    assert divide(-6, 2) == -3.0
```

**Why:** Coverage measures line execution, not behavioral correctness. A test that calls a function but never asserts anything raises coverage and provides zero protection against regressions. Write tests that could actually fail.

***

#### ❌ Writing Tests That Are Tightly Coupled to Implementation

**Wrong:**

```python
def test_user_creation():
    user = User("Alice")
    assert user._internal_dict["name"] == "Alice"  # ← testing internals
    assert user._id_counter == 1                    # ← brittle implementation detail
```

**Correct:**

```python
def test_user_creation():
    user = User("Alice")
    assert user.name == "Alice"         # test the public interface
    assert user.id is not None          # verify semantics, not internals
    assert isinstance(user.id, str)
```

**Why:** Tests coupled to implementation details break every time you refactor internals, even when behavior is unchanged. Test the public interface and observable behavior.

***

#### ❌ Using Regex for Structured Data Parsing

**Wrong:**

```python
# Extracting fields from HTML, JSON, or other structured formats
email = re.search(r'<input name="email" value="(.+?)"', html).group(1)
name = re.search(r'"name": "(.+?)"', json_str).group(1)   # breaks on escaped quotes
```

**Correct:**

```python
from bs4 import BeautifulSoup
import json

# HTML — use a proper parser
email = BeautifulSoup(html, 'html.parser').find('input', {'name': 'email'})['value']

# JSON — use the language's built-in parser
name = json.loads(json_str)["name"]    # handles escaped quotes, nesting, all edge cases
```

**Why:** HTML and JSON are not regular languages. Regex can't correctly handle nested tags, escaped characters, or alternative whitespace. Structured formats have dedicated parsers designed to handle all valid inputs correctly.

***

#### ❌ No `pre-commit` Configuration in the Repo

**Wrong:** Each developer configures (or doesn't configure) their own hooks independently. One developer has `ruff` installed and enabled; another doesn't. PRs arrive with inconsistent formatting.

**Correct:** Commit `.pre-commit-config.yaml` to the repo and document `pre-commit install` in `CONTRIBUTING.md` / `AGENTS.md` / setup instructions. Everyone gets identical hooks, identical versions, automatically.

***

### Exercises

#### Beginner Exercises

1. **Bootstrap a quality stack:** Pick a project you're working on. Configure a formatter (`ruff format` for Python, `prettier` for JS), a linter (`ruff check` or `eslint`), and `pre-commit` hooks. Run the formatter — how many files changed? Run the linter — how many issues appear? Use an AI agent to fix the linter errors, making sure the agent can run the linter itself to iterate autonomously. Review the agent's changes carefully.
2. **Write your first unit tests:** Pick a module you've written with at least three functions. Write unit tests using `pytest`. Run with `--cov-report=html`. Open the HTML report. What percentage of lines are covered? Find an uncovered branch and write a test that covers it.
3. **Set up CI:** Create a `.github/workflows/ci.yml` for a project on GitHub. Include: format check, lint, type check, tests. Push a deliberate violation (e.g., remove spaces around an operator). Verify CI fails and catches it. Fix it. Verify CI passes.
4. **Regex fundamentals:** On [regex101.com](https://regex101.com/), build patterns for: (a) a valid IPv4 address (four groups of 1–3 digits, dot-separated), (b) a Python import statement (`import X` or `from X import Y`), (c) a hexadecimal color code (`#fff` or `#ffffff`). For each, write at least three strings that should match and three that should not. Test all six against your pattern.
5. **`semgrep` vs. `grep`:** Write a Python file with `subprocess.Popen` calls, including: one with `shell=True`, one without, one multiline with `shell=True` on a different line than `Popen`. Use `grep` to try to find the dangerous usages. Now use `semgrep`. Which misses the multiline case?

***

#### Intermediate Exercises

6. **Property-based testing:** Install `hypothesis`. Pick a function with non-trivial edge cases (a sort, a parser, an encoder). Write at least two `@given` tests using appropriate strategies. Run them — does Hypothesis find any edge cases your manual tests missed?
7. **Coverage + agent feedback loop:** Run your test suite with `--cov-report=term-missing`. Give the agent the exact coverage output and a target (e.g., 80%). Let it write tests iteratively, checking coverage after each batch. After the agent finishes, review each test: does it have meaningful assertions? Does it test behavior, not implementation?
8. **Regex search-and-replace in your editor:** In the [raw source](https://raw.githubusercontent.com/missing-semester/missing-semester/refs/heads/master/_2026/code-quality.md) of these lecture notes, use your editor's regex search-and-replace to change all `-` Markdown bullet markers to `*`. The challenge: `-` appears in many non-bullet contexts (code blocks, option flags, compound words). Your regex must match only lines where `-` is used as a list bullet. Test your pattern before applying it.
9. **Write a custom `semgrep` rule:** Write a YAML `semgrep` rule for a pattern that's dangerous in a codebase you work on — examples: `eval()`, `exec()`, `os.system()`, `pickle.loads()` on untrusted input, direct string concatenation into SQL queries. Run it against the codebase and document any findings.
10. **Build a complete `Justfile`:** For a project with at least three quality tools, write a `Justfile` with: `format`, `lint`, `typecheck`, `test`, `check` (runs all four in check-only mode, mirroring CI), and `setup` (installs everything + pre-commit hooks). Verify `just check` produces identical results to your CI pipeline.

***

#### Advanced Challenge

11. **Regex for structured log parsing:** Write a Python script that reads Nginx access logs from stdin, extracts: IP, timestamp, HTTP method, path, status code, and response size, and outputs a JSON object per line. Use a single compiled regex pattern. Test it against log lines with varying formats (different HTTP versions, quoted strings with spaces in the path). Add a `--filter-status` CLI argument that only outputs lines with a matching status code (e.g., `--filter-status 5..` for all 5xx errors).
12. **Matrix CI with dependency caching:** Extend your CI workflow to run a matrix across at least two Python versions and two operating systems. Add dependency caching using `actions/cache` to avoid reinstalling packages on every run. Measure the wall-clock time with and without caching. Add a scheduled trigger (`on: schedule`) to run weekly to catch breakage from external dependency updates.
13. **Mutation testing:** Install [`mutmut`](https://github.com/boxed/mutmut) (Python mutation testing tool). Run it against your test suite. Mutation testing introduces small code changes (mutations) and checks whether your tests catch them. A mutation that your tests don't catch reveals a gap in test quality even where coverage looks complete. How many mutations survive? What do the surviving mutations reveal about your test suite?

***

### Summary

| Tool / Concept     | Purpose                                             | Key Commands                               |
| ------------------ | --------------------------------------------------- | ------------------------------------------ |
| **`ruff format`**  | Enforce surface syntax consistency                  | `ruff format .` / `--check` for CI         |
| **`ruff check`**   | Static analysis for bugs and antipatterns           | `ruff check .` / `--fix` to auto-fix       |
| **`mypy`**         | Static type checking                                | `mypy src/` / `--strict` for full checking |
| **`semgrep`**      | AST-level pattern matching, custom rules            | `semgrep -l python -e "pattern"`           |
| **`pytest`**       | Test runner: unit, integration, functional          | `pytest -v -x -k "pattern"`                |
| **`hypothesis`**   | Property-based testing (auto-generates inputs)      | `@given(st.text())`                        |
| **Code coverage**  | Measures which lines execute in tests               | `pytest --cov=src --cov-report=html`       |
| **`pre-commit`**   | Run quality tools automatically before every commit | `pre-commit install` / `--all-files`       |
| **GitHub Actions** | CI: run quality tools on every push/PR              | `.github/workflows/ci.yml`                 |
| **`just`**         | Command runner: short aliases for long commands     | `just lint`, `just test`, `just check`     |
| **Regex**          | Pattern language for string matching and extraction | `re.match`, `re.findall`, `re.sub`         |

#### The Golden Rule of Code Quality Infrastructure

```
Local          →  Pre-commit     →  CI
(editor, on    →  (on commit,    →  (on push, neutral
 save, fast)      local, blocks     environment, blocks
                  bad commits)      bad merges)

Rule: Every quality check you run manually should also run automatically.
      Every check that runs in CI should also be runnable in one command locally.
      'just check' == what CI does == what pre-commit enforces.
      No surprises when you push.
```

#### What's Next

This is the final lecture of the course. You've now covered the full stack of tools and practices that working engineers use every day — from the shell and version control, through debugging, packaging, and agentic coding, to the quality infrastructure that keeps codebases healthy over time.

The best way to consolidate everything is to use it. Pick a real project, set up the full quality stack, contribute to something open source, or build something from scratch with these tools in hand. The course materials will stay available at [missing.csail.mit.edu](https://missing.csail.mit.edu) — come back to any lecture when a specific tool becomes relevant to what you're working on.

***

*Source:* [*MIT Missing Semester – Code Quality*](https://missing.csail.mit.edu/2026/code-quality/) *Licensed under* [*CC BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://shankar-lab.gitbook.io/mylearning/the-missing-semester-of-your-cs-education/lecture-9-code-quality.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.