# Lecture 5 – Version Control and Git

### Overview

Version control is the single most important tool in a software developer's daily workflow. It answers questions that would otherwise be impossible: *Who changed this line? When? Why did that test start failing three months ago? How do I work on two features at the same time without breaking each other?*

While many version control systems exist, **Git** is the de facto standard — powering GitHub, GitLab, and virtually every serious software project on Earth. But Git has a reputation for being confusing. The reason is almost always the same: people learn Git **top-down** (memorising commands) instead of **bottom-up** (understanding the data model those commands manipulate).

This lecture takes the bottom-up approach. Once you understand Git's data model — objects, references, and the commit DAG — every command becomes *readable* rather than magical. You'll stop cargo-culting `git reset --hard` and start understanding exactly what it does to the graph.

#### Key Takeaways

* Git models history as a **directed acyclic graph (DAG)** of snapshots — not a linear list.
* Everything Git stores is one of three **content-addressed objects**: blob (file), tree (directory), or commit (snapshot). Objects are immutable and identified by their SHA-1 hash.
* **References** (branches, tags, HEAD) are mutable human-readable pointers to specific commits.
* The **staging area** (index) gives you fine-grained control over exactly what goes into each commit — separating *what changed* from *what to record*.
* Every `git` command is ultimately a manipulation of the object store and reference map. Understanding that unlocks everything.
* **Branches are cheap** — a branch is just a reference (a file containing a hash). Creating one is instant.
* **Merging vs. Rebasing** produce the same final code but different commit histories. Understanding the difference matters for collaboration.
* Git ≠ GitHub. Git is the version control system; GitHub is a hosting platform with its own collaboration model (pull requests, forks).

***

### Core Concepts

#### Git's Data Model: Objects

Git's entire power comes from a remarkably simple data model. There are exactly three kinds of objects:

| Object     | What it is                                             | Mutable? |
| ---------- | ------------------------------------------------------ | -------- |
| **blob**   | Raw file contents (just bytes)                         | No       |
| **tree**   | A directory: maps names → blobs or trees               | No       |
| **commit** | A snapshot: points to a tree, has parents and metadata | No       |

In pseudocode:

```
type blob   = array<byte>
type tree   = map<string, blob | tree>
type commit = {
    parents:  array<commit>
    author:   string
    message:  string
    snapshot: tree
}
type object = blob | tree | commit
```

Every object is stored in Git's object database **content-addressed by its SHA-1 hash** — the ID of an object is derived from its content. This means:

* **Identical content → identical hash** — Git never stores the same data twice.
* **Tamper-proof** — if any bit changes, the hash changes, and Git knows.
* **Objects are immutable** — you can never edit a commit; you can only create a new one.

```bash
# Inspect any object directly (Git's low-level plumbing)
git cat-file -p HEAD              # print the current commit object
git cat-file -p HEAD^{tree}       # print the tree it points to
git cat-file -p <blob-hash>       # print raw file contents
git cat-file -t <hash>            # show the type: blob / tree / commit
```

A concrete example of what a tree object looks like inside:

```
100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85    README.md
040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87    src
100644 blob 9a3e2b1c4d5f6a7b8c9d0e1f2a3b4c5d6e7f8a    main.py
```

***

#### The Commit DAG: Modeling History

A repository's history is a **directed acyclic graph (DAG)** of commit objects. Each commit points to its parent(s). A linear history looks like this:

```
C1 ← C2 ← C3 ← C4    (arrows point to parent)
```

When two branches of development diverge and are later merged, the commit graph forks and rejoins:

```
C1 ← C2 ← C3 ← C4 ←──── M      ← merge commit (has two parents)
               ↑          ↑
               └── C5 ← C6
```

This is a true graph structure, not a linear list. The ability to navigate this graph — jump to any commit, compare any two, create branches, merge them — is what makes Git powerful.

> **The key insight:** Every Git command is a manipulation of this graph. `git commit` adds a node. `git branch` adds a reference pointer. `git merge` adds a node with two parents. `git reset` moves a reference pointer. Once you see Git as graph manipulation, every command makes sense.

***

#### References: Human-Readable Names for Hashes

SHA-1 hashes are 40 hex characters long: `5d83f9e5c64b3c2a...`. Nobody remembers those. Git's solution: **references** — mutable human-readable names that point to specific commits.

```
references = map<string, commit-hash>
```

| Reference type      | Example                | Points to                                                |
| ------------------- | ---------------------- | -------------------------------------------------------- |
| **Branch**          | `main`, `feature/auth` | Latest commit on that branch                             |
| **Tag**             | `v1.0.0`, `v2.3.1`     | A specific commit (often a release)                      |
| **HEAD**            | `HEAD`                 | "Where you currently are" — the current branch or commit |
| **Remote tracking** | `origin/main`          | Last known state of `main` on the remote                 |

**HEAD** is special: it's the reference that answers "where am I right now in the history?" When you make a new commit, Git:

1. Creates a new commit object whose parent is `HEAD`'s current target
2. Moves the current branch reference forward to the new commit
3. HEAD still points to the branch, which now points to the new commit

When HEAD points directly to a commit hash (not a branch), you're in **detached HEAD state** — you can look around but any new commits won't be attached to a named branch.

***

#### The Staging Area (Index)

The staging area is Git's mechanism for composing clean, logical commits.

**The three zones:**

```
Working Directory  →  Staging Area  →  Repository
  (your files)       (git add)         (git commit)
```

* **Working directory**: your actual files as you edit them
* **Staging area (index)**: a proposed snapshot — what will be in the next commit
* **Repository**: permanent commit history

This separation lets you:

* Commit only part of your changes (e.g., bugfix without debug prints)
* Build up a commit incrementally with `git add -p`
* Review exactly what will be committed before committing

```bash
git diff             # working dir vs staging area (unstaged changes)
git diff --staged    # staging area vs last commit (what will be committed)
git diff HEAD        # working dir vs last commit (all changes)
```

***

#### Branches and Merging

A **branch** in Git is just a file containing a commit hash — a moveable pointer. Creating one is free and instant. This is why branches are used so heavily in Git workflows.

```bash
git branch feature/login    # create branch pointing to current commit
git switch feature/login    # move HEAD to that branch
# (or in one step:)
git checkout -b feature/login
```

Every commit you make on the new branch advances that branch's pointer, while `main` stays untouched.

**Merging** integrates one branch into another:

```bash
git switch main
git merge feature/login
```

If the histories diverged, Git creates a **merge commit** with two parents. If one branch is a direct ancestor of the other, Git does a **fast-forward** (just moves the pointer forward — no merge commit needed).

**Rebasing** is an alternative to merging that rewrites commit history to be linear:

```bash
git switch feature/login
git rebase main
```

This takes the commits on `feature/login` that aren't on `main`, and replays them on top of `main`'s current HEAD, producing a clean linear history. The commits get new hashes (they are new objects). Never rebase commits that others have already based work on.

***

#### Remotes and Collaboration

A **remote** is another copy of the repository — on GitHub, a colleague's machine, or a server. The canonical workflow:

```bash
git clone <url>                      # copy a remote repo locally
git fetch                            # download remote changes (don't merge)
git pull                             # fetch + merge (git fetch + git merge)
git push origin main                 # upload local commits to remote
```

**Remote-tracking branches** (`origin/main`, `origin/feature`) are your local read-only view of what was on the remote last time you `fetch`ed.

The three-step collaboration loop:

```
1. git fetch           → get remote changes into origin/main
2. git merge origin/main  → integrate them into your local main
3. git push            → share your commits
```

(or: `git pull` does steps 1+2 together.)

***

#### Undoing Things: The Right Tool for the Job

Git has several undo mechanisms, each appropriate for a different situation:

| Situation                       | Command                       | What it does                           |
| ------------------------------- | ----------------------------- | -------------------------------------- |
| Unstage a file                  | `git restore --staged <file>` | Remove from staging area, keep changes |
| Discard working changes         | `git restore <file>`          | Revert file to last committed state    |
| Amend last commit               | `git commit --amend`          | Replace last commit with a new one     |
| Undo staged changes (older API) | `git reset <file>`            | Unstage                                |
| Move branch back (keep changes) | `git reset --soft HEAD~1`     | Undo last commit, keep in staging      |
| Move branch back (unstage)      | `git reset HEAD~1`            | Undo last commit, keep in working dir  |
| Move branch back (discard)      | `git reset --hard HEAD~1`     | Undo last commit, discard changes      |
| Undo a commit (safely)          | `git revert <commit>`         | Create a new commit that undoes it     |

> **`git revert` vs `git reset`:** `reset` rewrites history (dangerous for shared branches); `revert` adds a new commit that undoes the change (safe for any branch).

***

#### Advanced Git Tools

| Tool              | What it does                                | When to use                                    |
| ----------------- | ------------------------------------------- | ---------------------------------------------- |
| `git stash`       | Temporarily shelve uncommitted changes      | Switching context mid-work                     |
| `git bisect`      | Binary search history for a regression      | "It worked 3 months ago, when did it break?"   |
| `git blame`       | Show last edit of every line                | "Who wrote this and why?"                      |
| `git add -p`      | Interactively stage hunks                   | Building clean, focused commits                |
| `git rebase -i`   | Interactively edit, reorder, squash commits | Cleaning up a branch before merging            |
| `git cherry-pick` | Apply a specific commit to current branch   | Backporting a bugfix to an older release       |
| `git worktree`    | Check out multiple branches simultaneously  | Working on two features at once                |
| `git reflog`      | Log of all HEAD movements                   | Recovering from "I accidentally reset too far" |

***

### Mental Model

#### Git as a Graph Database

Stop thinking of Git as a "save button with history". Think of it as a **graph database** where:

* **Nodes** = commit objects (immutable, content-addressed)
* **Edges** = parent relationships between commits
* **Labels** = references (branches, tags, HEAD)

Every `git` command is a database operation on this graph:

```
git commit    → INSERT new node, MOVE branch label forward
git branch    → INSERT new label pointing to current node
git merge     → INSERT new node with two parent edges
git reset     → MOVE label to a different existing node
git rebase    → COPY nodes to new positions, MOVE label
git checkout  → MOVE HEAD label
```

When you're stuck, ask: "What do I want the graph to look like after this command?" Then find the command that achieves that transformation.

***

#### The Three Trees

Git operates on three "trees" (file states) simultaneously:

```
┌─────────────────┐     git add      ┌──────────────┐    git commit   ┌────────────┐
│ Working          │ ───────────────► │ Staging Area │ ──────────────► │ Repository │
│ Directory        │                  │ (Index)      │                  │ (HEAD)     │
│ (what you see)   │ ◄─────────────── │              │ ◄────────────── │            │
└─────────────────┘  git restore      └──────────────┘  git reset       └────────────┘
```

* `git diff` compares Working Directory ↔ Staging Area
* `git diff --staged` compares Staging Area ↔ Repository
* `git diff HEAD` compares Working Directory ↔ Repository

This model explains why `git reset HEAD~1` (unstages + keeps changes) and `git reset --hard HEAD~1` (discards all changes) do different things — they operate on different parts of the three-tree model.

***

#### Content Addressing: Why Objects are Immutable

When Git stores an object, it hashes the content to get the ID. This has a profound consequence: **you cannot edit history, only extend it**.

```
"Fix typo in README" commit → SHA: a3f9b2...
  └── changing any single byte → SHA: d7c4e1...  (completely different)
```

This is why `git commit --amend` doesn't really amend — it creates a brand new commit with a new hash, and moves the branch reference to point to it. The old commit still exists in the object store (until garbage collected). `git reflog` can find it.

The practical implication: **any operation that rewrites history (amend, rebase, reset --hard) should only be done on commits you haven't shared with others**, because their local copies still point to the old commits.

***

#### HEAD, Branch, and Detached HEAD

```
Normal state:
  HEAD → main → commit C4

  Meaning: "I am on branch main, which points to C4"
  git commit → creates C5, moves main to C5, HEAD still points to main

Detached HEAD state:
  HEAD → commit C4   (directly, not through a branch)

  Meaning: "I am looking at commit C4, not on any branch"
  git commit → creates C5, moves HEAD to C5, but no branch is updated
  → C5 will be unreachable once you switch away (unless you create a branch)
```

Fix detached HEAD: `git checkout -b new-branch-name` while still on the dangling commit.

***

### Commands and Syntax

#### Setting Up a Repository

```bash
# Start a new repo
git init
git init my-project           # create directory + init

# Clone an existing repo
git clone https://github.com/user/repo.git
git clone --depth=1 <url>     # shallow clone (faster, no full history)
git clone <url> my-dirname    # clone into a specific directory
```

***

#### The Daily Cycle

```bash
git status                    # what changed? what's staged?
git add <file>                # stage a specific file
git add .                     # stage everything in current directory
git add -p                    # interactively choose which hunks to stage
git diff                      # unstaged changes
git diff --staged             # staged changes (what will be in next commit)
git commit                    # open editor to write commit message
git commit -m "message"       # commit with inline message
git commit --amend            # revise the last commit (message or content)
git log                       # view history
git log --oneline             # compact one-line format
git log --all --graph --decorate --oneline   # full DAG visualization
```

***

#### Branching and Merging

```bash
git branch                    # list local branches
git branch -a                 # list all branches (including remote-tracking)
git branch <name>             # create a new branch
git switch <name>             # switch to a branch
git checkout -b <name>        # create and switch in one step
git branch -d <name>          # delete a merged branch
git branch -D <name>          # force-delete (even if not merged)

git merge <branch>            # merge branch into current branch
git merge --no-ff <branch>    # always create a merge commit (no fast-forward)
git merge --abort             # abort a merge in progress

git rebase <branch>           # rebase current branch onto another
git rebase -i HEAD~3          # interactively rebase the last 3 commits
git rebase --abort            # abort a rebase in progress
```

***

#### Working with Remotes

```bash
git remote -v                             # list remotes with URLs
git remote add origin <url>              # add a remote named "origin"
git remote set-url origin <new-url>      # change a remote's URL

git fetch                                 # download all remote changes
git fetch origin                          # fetch from 'origin' specifically
git pull                                  # fetch + merge current branch
git pull --rebase                         # fetch + rebase (cleaner history)

git push origin main                      # push local main to remote
git push -u origin main                   # push + set upstream tracking
git push --force-with-lease              # safer force push (checks for others' work)
git push origin --delete <branch>        # delete remote branch
```

***

#### Undoing and Recovering

```bash
git restore <file>                   # discard working dir changes
git restore --staged <file>          # unstage a file
git reset HEAD~1                     # undo last commit, keep changes unstaged
git reset --soft HEAD~1              # undo last commit, keep changes staged
git reset --hard HEAD~1              # undo last commit, discard all changes
git revert <commit-hash>             # new commit that undoes an old one (safe)
git revert HEAD                      # undo the very last commit safely

git stash                            # temporarily shelve all uncommitted changes
git stash push -m "message"          # stash with a descriptive name
git stash list                       # list all stashes
git stash pop                        # apply most recent stash + remove it
git stash apply stash@{2}            # apply a specific stash, keep it

git reflog                           # log of every HEAD movement (recovery lifeline)
git checkout <hash-from-reflog>      # recover a "lost" commit
```

***

#### Inspection and Investigation

```bash
git log --oneline --graph --all      # visual DAG of all branches
git log --follow <file>              # history of a specific file (through renames)
git log -S "function_name"           # find commits that added/removed a string
git log --author="Alice"             # filter by author
git log --since="2 weeks ago"        # filter by time

git show <commit>                    # show a commit's diff and metadata
git show HEAD                        # show the current commit
git show HEAD:path/to/file           # show file content at a specific commit

git diff <commit1> <commit2>         # diff between two commits
git diff main..feature               # diff between tips of two branches
git diff main...feature              # diff from their common ancestor

git blame <file>                     # who last changed each line, and when
git blame -L 10,20 <file>            # blame only lines 10-20

git bisect start                     # begin binary search for regression
git bisect bad                       # mark current commit as broken
git bisect good <commit>             # mark an older commit as working
git bisect reset                     # end bisect session
```

***

#### Configuration and Aliases

```bash
# Identity (required for commits)
git config --global user.name "Alice Smith"
git config --global user.email "alice@example.com"

# Default branch name
git config --global init.defaultBranch main

# Useful aliases
git config --global alias.graph "log --all --graph --decorate --oneline"
git config --global alias.st status
git config --global alias.co checkout
git config --global alias.unstage "restore --staged"
git config --global alias.last "log -1 HEAD"

# Show current config
git config --list
git config --global --list
```

***

#### `.gitignore`

```gitignore
# Ignore compiled files
*.pyc
*.o
*.class
__pycache__/
*.egg-info/

# Ignore build output
dist/
build/
target/

# Ignore IDE files
.vscode/
.idea/
*.swp

# Ignore OS files
.DS_Store
Thumbs.db

# Ignore secrets and credentials
.env
*.key
*.pem
secrets.json

# But track this specific file even if it matches a pattern above
!important.env.example
```

Set a global gitignore for things that apply everywhere:

```bash
git config --global core.excludesfile ~/.gitignore_global
```

***

### Command Flow Diagrams

#### Git's Object Model

```mermaid
graph TD
    subgraph "Object Store (content-addressed by SHA-1)"
        COMMIT["commit a3f9b2\n─────────────\nauthor: Alice\nmessage: Add login\nparent: 7c2e44\nsnapshot: tree 5f8a1b"]
        TREE1["tree 5f8a1b\n─────────────\nREADME.md → blob d4e2f1\nsrc/ → tree 9c3b2a"]
        TREE2["tree 9c3b2a\n─────────────\nauth.py → blob 1b4c8d\nmain.py → blob 7a2e9f"]
        BLOB1["blob d4e2f1\n─────────────\n# My Project\n..."]
        BLOB2["blob 1b4c8d\n─────────────\ndef login(user):\n    ..."]
    end

    COMMIT -->|"snapshot →"| TREE1
    TREE1 -->|"src/ →"| TREE2
    TREE1 -->|"README.md →"| BLOB1
    TREE2 -->|"auth.py →"| BLOB2

    style COMMIT fill:#4a2d00,color:#fff
    style TREE1 fill:#1a3a5c,color:#fff
    style TREE2 fill:#1a3a5c,color:#fff
    style BLOB1 fill:#2d4a22,color:#fff
    style BLOB2 fill:#2d4a22,color:#fff
```

***

#### The Commit DAG with References

```mermaid
gitGraph
   commit id: "C1: Initial"
   commit id: "C2: Add README"
   commit id: "C3: Add auth"
   branch feature/login
   commit id: "C5: Login form"
   commit id: "C6: Validate input"
   checkout main
   commit id: "C4: Fix typo"
   merge feature/login id: "M1: Merge login"
   commit id: "C7: Bump version"
```

***

#### The Three Trees: Working Directory, Staging, Repository

```mermaid
sequenceDiagram
    participant WD as Working Directory
    participant SA as Staging Area
    participant REPO as Repository (HEAD)

    Note over WD: You edit auth.py

    WD->>SA: git add auth.py
    Note over SA: auth.py staged

    WD->>WD: You edit main.py (not staged yet)

    SA->>REPO: git commit -m "Add auth"
    Note over REPO: New commit C5 created<br/>main.py NOT included

    REPO->>WD: git restore auth.py
    Note over WD: Reverts auth.py to<br/>last committed state
```

***

#### Branch Creation and Merge Workflow

```mermaid
graph LR
    subgraph "Before merge"
        C1["C1"] --> C2["C2"] --> C3["C3 (main)"]
        C2 --> C4["C4"] --> C5["C5 (feature)"]
    end

    subgraph "After: git merge feature"
        A1["C1"] --> A2["C2"] --> A3["C3"] --> M["M (merge commit)\nmain"]
        A2 --> A4["C4"] --> A5["C5"] --> M
    end

    subgraph "After: git rebase main (on feature)"
        B1["C1"] --> B2["C2"] --> B3["C3 (main)"] --> B4["C4' (replayed)"] --> B5["C5' (replayed)\nfeature"]
    end
```

***

#### Remote Collaboration Flow

```mermaid
sequenceDiagram
    participant Local
    participant Remote as Remote (origin)
    participant Colleague

    Colleague->>Remote: git push (adds commits)
    Local->>Remote: git fetch
    Note over Local: origin/main updated<br/>local main unchanged
    Local->>Local: git merge origin/main
    Note over Local: Fast-forward or<br/>merge commit
    Local->>Local: git commit (new work)
    Local->>Remote: git push
    Note over Remote: Remote updated
```

***

#### git bisect: Binary Search for Regressions

```mermaid
graph LR
    C1["C1 ✅ good"] --> C2["C2"] --> C3["C3"] --> C4["C4"] --> C5["C5"] --> C6["C6"] --> C7["C7 ❌ bad"]

    C4["C4 ← bisect checks here\n(midpoint)"]
    style C1 fill:#2d4a22,color:#fff
    style C7 fill:#4a0000,color:#fff
    style C4 fill:#4a3a00,color:#fff

    NOTE["Each step halves the search space.\n7 commits → found in ≤3 checks.\n1000 commits → found in ≤10 checks."]
```

***

### Command Pipeline Examples

#### Visualise the Full Repository History

```bash
git log --all --graph --decorate --oneline
```

| Part         | What it does                              |
| ------------ | ----------------------------------------- |
| `--all`      | Show all branches and tags, not just HEAD |
| `--graph`    | Draw ASCII art of the commit DAG          |
| `--decorate` | Show branch and tag names next to commits |
| `--oneline`  | One line per commit: hash + message       |

Add this as an alias: `git config --global alias.graph "log --all --graph --decorate --oneline"`

***

#### Find Who Introduced a Bug (git blame + git show)

```bash
# Step 1: find the line and who last touched it
git blame src/auth.py -L 42,50

# Step 2: see the full context of that commit
git show a3f9b2e

# Step 3: find when a specific string was added
git log -S "verify_token" --oneline
```

***

#### Interactive Staging: Build a Clean Commit

```bash
# Instead of staging the whole file, choose specific changes
git add -p auth.py

# Git shows each "hunk" and asks: stage this? [y/n/s/q/?]
#   y → stage this hunk
#   n → skip this hunk
#   s → split into smaller hunks
#   q → quit staging
#   ? → help

# Then commit only the staged hunks
git commit -m "Fix authentication bypass — input validation only"
```

***

#### Binary Search for a Regression (git bisect)

```bash
# Start bisect
git bisect start

# Mark current state as broken
git bisect bad

# Mark a known-good old commit
git bisect good v1.0.0

# Git checks out the midpoint commit
# Run your test:
./run_tests.sh

# Tell git the result:
git bisect good   # or: git bisect bad

# Repeat until git identifies the first bad commit
# → "a3f9b2e is the first bad commit"

# Clean up
git bisect reset
```

***

#### Interactive Rebase: Clean Up Before Merging

```bash
# Squash and clean up the last 4 commits before merging to main
git rebase -i HEAD~4

# Git opens an editor with:
# pick a3f9b2e Add login form
# pick 7c2e441 wip
# pick d4e2f1a fix typo
# pick 1b4c8d9 fix tests

# Change to:
# pick a3f9b2e Add login form
# squash 7c2e441 wip
# fixup d4e2f1a fix typo
# squash 1b4c8d9 fix tests

# Result: one clean commit "Add login form" with everything included
```

***

### Real World Workflows

#### The Feature Branch Workflow

```bash
# 1. Start fresh from latest main
git switch main
git pull origin main

# 2. Create a feature branch
git checkout -b feature/password-reset

# 3. Do work in small, focused commits
git add src/auth/reset.py
git commit -m "Add password reset token generation"

git add src/auth/email.py tests/test_reset.py
git commit -m "Add reset email sending and tests"

# 4. Keep up with main while working
git fetch origin
git rebase origin/main        # replay your commits on top of latest main

# 5. Push and open a pull request
git push -u origin feature/password-reset

# 6. After PR is approved, merge and clean up
git switch main
git pull origin main          # now includes your feature
git branch -d feature/password-reset
```

***

#### Recovering a "Lost" Commit

```bash
# Scenario: you ran git reset --hard HEAD~3 and lost commits

# reflog records every HEAD movement
git reflog
# → HEAD@{0}: reset: moving to HEAD~3
# → HEAD@{1}: commit: Add important feature
# → HEAD@{2}: commit: More work
# → HEAD@{3}: commit: The crucial commit we lost

# Recover by creating a branch at the lost commit
git checkout -b recovery HEAD@{3}
# or simply cherry-pick it:
git cherry-pick HEAD@{3}
```

***

#### Setting Up a Shared Repository on GitHub

```bash
# Initial setup (once)
git init
git add .
git commit -m "Initial commit"
git remote add origin git@github.com:yourname/project.git
git push -u origin main

# Day-to-day collaboration loop
git pull --rebase          # get colleagues' changes, replay yours on top
# ... do work ...
git add -p                 # stage selectively
git commit -m "descriptive message"
git push

# When push is rejected (someone pushed first)
git pull --rebase          # incorporate their commits, retry push
git push
```

***

#### Resolving a Merge Conflict

```bash
git merge feature/login
# CONFLICT (content): Merge conflict in src/auth.py
# Automatic merge failed; fix conflicts and then commit.

# Open src/auth.py — it contains conflict markers:
# <<<<<<< HEAD (your branch)
# def login(user, password):
#     return bcrypt.verify(password, user.hash)
# =======
# def login(user, password, mfa_token=None):
#     if not bcrypt.verify(password, user.hash):
#         return False
#     return verify_mfa(user, mfa_token)
# >>>>>>> feature/login (incoming branch)

# Edit the file to the desired final state (remove markers)
vim src/auth.py

# Mark resolved and complete the merge
git add src/auth.py
git commit                 # Git pre-fills the merge commit message
# or: git merge --continue
```

***

#### Bisect to Find a Regression

```bash
# Scenario: tests were passing in v2.0, fail now. 150 commits later.

git bisect start
git bisect bad HEAD              # current HEAD is broken
git bisect good v2.0             # v2.0 tag was good

# Git checks out commit ~75
# Run your test:
python -m pytest tests/test_auth.py -q

git bisect good   # test passes → bug introduced after this
# Git checks out commit ~113
python -m pytest tests/test_auth.py -q

git bisect bad    # test fails → bug is in earlier half
# ... continues until ...
# → "3d7f9a1 is the first bad commit"
# → git show 3d7f9a1   to see exactly what changed

git bisect reset  # back to HEAD
```

***

### Productivity Tricks

#### Essential `.gitconfig` Aliases

```ini
[alias]
    graph  = log --all --graph --decorate --oneline
    st     = status
    co     = checkout
    br     = branch
    last   = log -1 HEAD --stat
    unstage = restore --staged
    undo   = reset HEAD~1
    who    = shortlog -sn
    find   = log -S          # git find "search string"
```

***

#### `git stash` for Context Switching

```bash
# Mid-feature, urgent bug report comes in
git stash push -m "WIP: password reset form"

# Switch context, fix the bug
git switch main
git checkout -b hotfix/sql-injection
# ... fix and commit ...
git switch main
git merge hotfix/sql-injection
git push

# Return to your feature
git switch feature/password-reset
git stash pop
# Back exactly where you left off
```

***

#### Interactive Rebase Cheatsheet

In `git rebase -i`, each line represents a commit. Change the verb:

| Verb      | Action                                             |
| --------- | -------------------------------------------------- |
| `pick`    | Keep commit as-is                                  |
| `reword`  | Keep commit, edit its message                      |
| `edit`    | Pause here to amend the commit                     |
| `squash`  | Combine with previous commit, merge messages       |
| `fixup`   | Combine with previous commit, discard this message |
| `drop`    | Delete this commit entirely                        |
| `reorder` | Move a line up/down to change commit order         |

***

#### Shell Prompt Git Integration

Add to `~/.bashrc` or use a framework plugin — shows current branch and status right in your prompt:

```bash
# For bash: add to ~/.bashrc
parse_git_branch() {
    git branch 2>/dev/null | grep '^*' | sed 's/* //'
}
export PS1="\u@\h \[\033[32m\]\w\[\033[33m\] [\$(parse_git_branch)]\[\033[00m\] $ "

# For zsh + oh-my-zsh: just set the theme, branch is included automatically
# For starship (cross-shell): https://starship.rs — includes git info by default
```

***

### Common Mistakes

#### ❌ Committing Secrets or Large Files

**The problem:** Once a secret (API key, password) or a 500 MB binary is committed and pushed, it's in the history — even if you delete it in the next commit.

**Prevention:**

```bash
# Always use .gitignore
echo ".env" >> .gitignore
echo "*.pem" >> .gitignore

# Check what you're about to commit
git diff --staged

# Review staged files before committing
git status
```

**Cure** (if it's already committed): use `git filter-repo` (preferred over the deprecated `git filter-branch`) to rewrite history and force-push. This is disruptive for collaborators and requires rotating the leaked secret.

***

#### ❌ Force-Pushing to a Shared Branch

**Wrong:**

```bash
git push --force origin main   # rewrites remote history for everyone
```

**Correct:**

```bash
# If you must force-push, use --force-with-lease
git push --force-with-lease origin main
# Fails if someone else has pushed since your last fetch
# Protects you from overwriting their work
```

**Better:** Never rebase or amend commits that exist on a shared remote branch. Only rewrite history on your private feature branches.

***

#### ❌ Vague Commit Messages

**Wrong:**

```
fix stuff
wip
asdf
update
```

**Correct:**

```
Fix authentication bypass: validate token before permission check

The previous implementation checked permissions before validating
the token, allowing unauthenticated requests to bypass the check
if they provided a well-formed but expired token.

Fixes #342
```

A good commit message answers: *what* changed and *why*. The diff answers *how*. Use the imperative mood ("Fix", "Add", "Refactor") for the subject. Keep the subject under 72 characters. Add a body if the why isn't obvious.

***

#### ❌ Working Directly on `main`

**Wrong:**

```bash
# All development directly on main
git add .
git commit -m "half-done feature"
git push origin main
```

**Correct:**

```bash
git checkout -b feature/my-change
# ... work ...
git push origin feature/my-change
# Open pull request → review → merge
```

**Why:** Working on `main` makes it impossible to have a clean, always- deployable main branch. Any in-progress work breaks everyone who pulls.

***

#### ❌ `git reset --hard` When You Meant `git restore`

**Wrong:**

```bash
# "I want to discard changes to README.md"
git reset --hard HEAD    # discards ALL uncommitted changes to ALL files!
```

**Correct:**

```bash
git restore README.md    # discard changes to one specific file only
```

`git reset --hard` is a sledgehammer — it resets everything. `git restore` is a scalpel — it targets specific files.

***

#### ❌ Ignoring Merge Conflicts and Overwriting

**Wrong:**

```bash
# Conflict happened — just take "ours" to make it go away
git checkout --ours conflicted_file.py
git add conflicted_file.py
git commit
# Silently discards the other person's work
```

**Correct:**

```bash
# Open the file, read both versions, understand what each does
# Craft a resolution that correctly integrates both changes
vim conflicted_file.py

# Or use a merge tool for a visual diff
git mergetool
```

***

### Exercises

#### Beginner Exercises

1. **Initialize and make commits:** Create a directory, run `git init`, create a few files, and make three separate commits — each with a focused, well- written message. After each commit, run `git log --oneline` and observe the growing history.
2. **Explore the object model:**

   ```bash
   git cat-file -t HEAD          # what type is HEAD?
   git cat-file -p HEAD          # print the commit object
   git cat-file -p HEAD^{tree}   # print the tree it points to
   ```

   Trace the chain: commit → tree → blob. Use `git cat-file -p <hash>` at each level.
3. **Stage selectively with `-p`:** Make two unrelated changes to one file (e.g., fix a bug on line 5 and add a feature on line 30). Use `git add -p` to stage only the bugfix hunk, commit it, then stage and commit the feature separately. Verify with `git log --oneline`.
4. **Create an alias:** Add `git graph` as an alias for `git log --all --graph --decorate --oneline` in `~/.gitconfig`. Use it to visualize history after each branching exercise.
5. **Explore `.gitignore`:** Create a project with some `.pyc` files, a `.env` file, and a `build/` directory. Set up a `.gitignore` that excludes all three. Verify with `git status` that they don't appear as untracked.

***

#### Intermediate Exercises

6. **Clone and investigate:**

   ```bash
   git clone https://github.com/missing-semester/missing-semester.git
   cd missing-semester
   git log --all --graph --oneline | head -30   # visualise history
   git log --follow _config.yml                 # history of a specific file
   git blame _config.yml | grep "collections:"  # who last changed that line
   git show <commit-hash>                        # what did that commit do?
   ```
7. **Practice branching and merging (recipe conflict):**
   * Create a repo with a `recipe.txt` file and commit it
   * Create two branches: `git branch salty` and `git branch sweet`
   * In `salty`, change "1 cup sugar" → "1 cup salt" and commit
   * In `sweet`, change "1 cup sugar" → "2 cups sugar" and commit
   * Merge both into `main` — resolve the conflict properly
   * Visualize the result with `git log --graph --oneline`
8. **Stash for context switching:**
   * Make uncommitted changes to a file
   * `git stash push -m "WIP: my changes"`
   * Switch to another branch, make a separate commit, switch back
   * `git stash pop` — verify your original changes are restored
9. **The reflog rescue:** Run `git reset --hard HEAD~2` (undo two commits). Run `git reflog` to find the lost commits. Recover them with `git checkout -b recovery <hash>`.
10. **Interactive rebase:** Make four messy commits (typos, "wip", etc.) on a feature branch. Use `git rebase -i HEAD~4` to squash them into one or two clean, logical commits. Inspect the result with `git log --oneline`.

***

#### Advanced Challenge

11. **`git bisect` to find a regression:** Write a script that produces a specific number (e.g., `./compute.sh` should output `42`). Make a series of \~10 commits where one of them introduces a bug that changes the output. Use `git bisect start`, `git bisect good`, `git bisect bad`, and let Git find the exact commit that broke it. Try automating it with `git bisect run ./test.sh`.
12. **Remove a secret from history:** Add a file `secrets.env` containing a fake API key to a repository and make several more commits. Then use `git filter-repo` (install with `pip install git-filter-repo`) to completely remove `secrets.env` from every commit in history. Verify it's gone with `git log --all --full-history -- secrets.env`.
13. **Build a complete contribution workflow:**
    * Fork the [class website repo](https://github.com/missing-semester/missing-semester)
    * Clone your fork locally
    * Create a feature branch, make a meaningful improvement
    * Rebase onto the latest upstream main before submitting
    * Push to your fork and open a pull request
    * Respond to review feedback with a new commit, then `git rebase -i` to squash it before the PR is merged
14. **Automate bisect:** Write a shell script `test.sh` that exits 0 if a test passes and 1 if it fails. Use `git bisect run ./test.sh` to fully automate the binary search without manually marking commits good/bad.

***

### Summary

| Topic            | Core Idea                                                                  |
| ---------------- | -------------------------------------------------------------------------- |
| **Data model**   | blob (file) + tree (dir) + commit (snapshot) — all content-addressed       |
| **Commit DAG**   | History is a graph, not a list. Branches are cheap labels on nodes.        |
| **Objects**      | Immutable, identified by SHA-1. Changing content = new hash = new object.  |
| **References**   | Mutable human-readable pointers to commits: branches, tags, HEAD           |
| **HEAD**         | "Where you are now" — points to current branch or commit                   |
| **Staging area** | Fine-grained control over what goes into each commit                       |
| **Branching**    | Free, instant — just a reference. Use branches liberally.                  |
| **Merging**      | Integrates histories; creates a merge commit. Preserves full graph.        |
| **Rebasing**     | Replays commits onto a new base. Linear history. Never on shared branches. |
| **Remotes**      | Another copy of the repo. fetch/push/pull synchronise the two.             |
| **`git stash`**  | Temporary shelf for uncommitted work during context switches               |
| **`git bisect`** | Binary search through commits to find regressions in O(log n)              |
| **`git reflog`** | Recovery lifeline — records every HEAD movement                            |

#### Most Important Commands from This Lecture

```
git init / git clone     – create a repository
git status               – see the three-tree state at a glance
git add / git add -p     – stage all or selective changes
git commit -m            – record a snapshot
git log --graph --oneline – visualise the commit DAG
git diff / --staged      – compare working dir, staging, and HEAD
git branch / git switch  – create and navigate branches
git merge / git rebase   – integrate branches
git stash                – shelve uncommitted work
git bisect               – binary search for regressions
git blame                – who wrote this line
git reflog               – recover from mistakes
git remote / fetch / push / pull – collaborate via remotes
git reset / git restore / git revert – undo at different levels
```

#### What's Next

In **Lecture 6 – Packaging and Shipping Code**, you'll learn how software goes from code on your machine to a deployable, reproducible artifact that can run anywhere — covering dependency management, containers, and CI/CD pipelines.

***

*Source:* [*MIT Missing Semester – Version Control and Git*](https://missing.csail.mit.edu/2026/version-control/) *Licensed under* [*CC BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://shankar-lab.gitbook.io/mylearning/the-missing-semester-of-your-cs-education/lecture-5-version-control-and-git.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
