The Directed Acyclic Graph:
An Anatomy of Git

An interactive exploration of the data structures underlying version control.

1. Content Addressable Storage

Git does not store "files" in the traditional sense of a file system. Instead, it acts as a key-value store. The key is the SHA-1 hash of the content, and the value is the content itself.

Before hashing, Git prepends a header to the content: blob {size}\0. This combination creates the "Blob" object.

Because the key is mathematically derived from the content, two identical files will always result in the same internal object, regardless of their filename or location. This is the foundation of Git's deduplication.

Experiment: Change a single character in the code below. Observe how the hash changes completely (the Avalanche Effect). Then change it back to restore the original hash.

blob 21\0print("hello world");
Calculating...

2. The Merkle Tree

We have Blobs to store content, but we have lost the filenames. To reconstruct a filesystem, Git uses Tree objects.

A Tree maps human-readable names to the internal Blob hashes we generated in the previous section. It is a simple list containing permissions, object type, hash, and filename.

This structure is recursive: a Tree can point to Blobs (files) or other Trees (subdirectories). This creates a Merkle Tree structure.

Experiment: Add a file to the virtual working directory. Watch how a Blob is created first, and then a Tree object is created to point to it.

Working Directory
Object Database
No tree generated yet.

3. The Snapshot (Commit)

A commit does not contain file diffs. It is a wrapper object that points to a specific Tree (the root of the project) and adds metadata: Author, Time, and Message.

Crucially, a commit also points to its parent. This forms a linked list backwards in time.

Hashcommit = SHA1(tree + parent + author + message)

Because the parent hash is part of the input data for the new hash, you cannot change an old commit without changing the hash of every single subsequent commit. This makes history immutable.

Modify files in Section 2, then commit here.

4. References (Branches)

Commits are immutable, but Branches are mutable. A branch is simply a file (a "reference") containing a 40-character SHA-1 hash.

HEAD is a special reference that usually points to the name of the current branch. It answers the question: "Where are we right now?"

When you commit, Git creates the new commit object, and then updates the reference pointed to by HEAD to contain the new hash.

.git/HEAD: ref: refs/heads/main
.git/refs/heads/main: ...

5. Divergence: Merge vs. Rebase

When history splits, we must eventually reconcile it. There are two primary mathematical ways to do this.

Merge: Creates a new "Tie" commit with two parents. It preserves the exact history of what happened when.

Rebase: Takes the unique commits from one branch and "re-plays" them on top of the other.

Watch closely:

When you Rebase, the commits get new Hash IDs (colored red). Since the parent field changed, the SHA-1 input changed. These are mathematically new objects, distinct from the originals.

Conclusion

Git is a Merkle Directed Acyclic Graph. Every command you run—commit, checkout, merge, rebase—is simply a graph manipulation operation.

By understanding that Git maps content to keys, and keys to a graph structure, the "magic" disappears, replaced by a simple, elegant data structure.