Git Under the Hood

An interactive exploration of the content-addressable filesystem.

1. The Blob

Git is fundamentally a key-value store. It doesn't store files by their name; it stores them by the hash of their content. This is a Blob.

When you add a file, Git runs a hash function (SHA-1) on the content to generate a unique 40-character ID. This ID is the "Key". The content is the "Value".

da39a3ee5e6b4b0d3255bfef95601890afd80709
.git/objects/da/39a3ee5e6b4b0d3255bfef95601890afd80709

Try typing hello. Note the hash. Delete it, and type hello again. The hash is identical. Git is deterministic.

2. The Tree

Blobs are anonymous. They contain content, but no filenames. To store the directory structure, Git uses a Tree object.

A Tree maps filenames to Blob hashes. If two files have the exact same content, the Tree points to the same Blob hash. Git automatically deduplicates storage.

// Tree Object is empty

3. The Commit

A commit freezes a Tree in time. Crucially, it adds metadata (author, timestamp) and a pointer to a Parent commit.

This creates a linked list. Because the parent's hash is part of the new commit's data, you cannot change history without changing the hash.

// Latest Commit Object Data

4. Branches

Branches are often misunderstood. A branch is not a container or a copy of your code. A branch is simply a lightweight, movable pointer.

Technically, it's just a text file containing a 40-character commit hash. Moving a branch is instantaneous because it is just a string update.

a1b2c d4e5f 98a76
feature
.git/refs/heads/feature d4e5f...

5. The Graph: Merge vs. Rebase

When histories diverge, we have two choices to reconcile them. We can tie them together (Merge) or rewrite history to make it look linear (Rebase).

Rebase provides a cleaner history, but notice what happens to the commit hashes: they change. You are creating new commits.

State: Diverged History