How Git Actually Works Internally
TL;DR
Git stores everything as objects (blobs, trees, commits, tags) identified by SHA-1 hash. Branches are just files containing a commit hash. Merges combine DAG nodes. Once you see the object model, git becomes predictable.
I used git for two years before I understood what it was actually doing. I knew the commands but couldn't reason about them. Then I read the .git directory and it clicked.
Git is a content-addressable filesystem with a version control interface on top.
The Object Store
Everything in git is an object stored in .git/objects/. There are four types:
blob — file contents
tree — directory listing (pointers to blobs and other trees)
commit — snapshot + metadata (author, message, parent commits, pointer to root tree)
tag — annotated tag (pointer to a commit + tag metadata)
Each object is identified by the SHA-1 hash of its contents. The hash is the address.
# See for yourself
ls .git/objects/
# ab/cd1234...
# ef/gh5678...
# pack/ ← packed objects (git gc packs loose objects)
# Read any object
git cat-file -t ab/cd1234... # type: blob, tree, commit, or tag
git cat-file -p ab/cd1234... # pretty-print contents
Blobs: File Contents
A blob stores raw file contents. Nothing else — no filename, no permissions.
# Hash any content
echo "hello world" | git hash-object --stdin
# 8c7e5a667f1b771847fe88c01c3de34413a1b220
# The SHA changes if content changes by even one byte
echo "hello world!" | git hash-object --stdin
# 7df4eaed87ab4a5c7be82c49ceec3e2b7c82a3a4
# Two files with identical contents share the same blob
# Git stores it once — free deduplication
Trees: Directory Structure
Trees map names and permissions to blobs and sub-trees:
git cat-file -p HEAD^{tree}
# 100644 blob a5c45e... .gitignore
# 100644 blob 3b18e5... README.md
# 040000 tree 7f4c9b... src
# 100755 blob 89ab12... build.sh
# ^^^^ ^^^^^^^^^^
# permissions SHA of blob/tree
The tree object for src/ points to the blobs inside it. The commit's root tree points to all of these. Nothing is stored by path — it's all nested structure.
Commits: Snapshots
A commit is a snapshot, not a diff.
git cat-file -p HEAD
# tree 7f4c9b... ← root tree for this snapshot
# parent a1b2c3... ← previous commit (can have 2 for merges, 0 for root)
# author Alice <a@x.com> 1743638400 +0000
# committer Alice <a@x.com> 1743638400 +0000
#
# Add new feature
This is why git diff between two commits computes the diff on the fly — git stores complete snapshots, not deltas. (Under the hood, pack files use delta compression, but the object model is snapshots.)
Branches Are Just Files
cat .git/refs/heads/main
# a1b2c3d4e5f6... ← just a commit SHA
cat .git/HEAD
# ref: refs/heads/main ← what branch you're on
# Creating a branch:
git branch feature
# Creates: .git/refs/heads/feature containing current HEAD commit SHA
# That's it. A branch is a 41-byte file.
When you commit, git:
- Creates blob objects for changed files
- Creates new tree objects for affected directories
- Creates a commit object pointing to the new root tree, with HEAD as parent
- Updates
.git/refs/heads/mainto point to the new commit
Before: After:
HEAD → main → C1 HEAD → main → C2 → C1
The DAG (Directed Acyclic Graph)
Commits form a directed acyclic graph. Each commit points to its parent(s). The graph only goes backward — you can always trace history, never go forward.
A ← B ← C ← D (main)
↑
E ← F (feature)
git merge feature:
A ← B ← C ← D ← M (main)
↑ ↑
E ← F ──┘
M is a merge commit with two parents: D and F
Fast-forward merge (no divergence):
A ← B ← C (main)
↑
D ← E (feature)
git merge feature:
A ← B ← C ← D ← E (main = feature, no merge commit needed)
Git just moves the branch pointer forward. That's why it's called fast-forward.
Rebase: Replaying Commits
Before rebase:
main: A ← B ← C
feature: ↑
D ← E
git checkout feature
git rebase main
After rebase:
main: A ← B ← C
feature: ↑
D' ← E'
D' and E' are new commits with the same changes as D and E,
but with C as their parent instead of B.
Rebase rewrites history — D' has a different SHA than D, even if the file changes are identical, because the parent hash changed. This is why you never rebase shared branches.
The Index (Staging Area)
.git/index is the index — a binary file representing the next commit.
# View index contents
git ls-files --stage
# 100644 a5c45e... 0 .gitignore
# 100644 3b18e5... 0 README.md
# git add updates the index:
git add file.txt
# Writes file.txt blob to object store, updates index to point to it
# git commit reads the index:
# Creates tree from index, creates commit pointing to that tree
git diff (no args) shows unstaged changes — working tree vs index.
git diff --cached shows staged changes — index vs last commit.
Tags Are Permanent Commit Labels
# Lightweight tag: just a ref (like a branch that doesn't move)
cat .git/refs/tags/v1.0
# a1b2c3... ← commit SHA
# Annotated tag: a full object with metadata
git cat-file -p v1.0-annotated
# object a1b2c3...
# type commit
# tag v1.0
# tagger Alice <a@x.com> 1743638400 +0000
#
# Release 1.0
Annotated tags are signed (with -s) and have their own SHA. They're the correct way to tag releases.
What This Explains
Why SHA collisions are catastrophic: If two different commits produce the same SHA, git can't tell them apart. This is the SHA-1 collision attack — git now supports SHA-256 objects to mitigate it.
Why git stash works like it does: Stash creates a commit (or two), stores them in refs/stash, restores the index to HEAD. git stash pop cherry-picks those commits back.
Why reflog saves you: Every time HEAD moves, git appends to .git/logs/HEAD. Objects that are no longer reachable from any ref aren't deleted immediately — they're garbage collected after a grace period. git reflog lets you find them by hash.
git reflog
# HEAD@{0}: reset: moving to HEAD~1
# HEAD@{1}: commit: accidentally deleted feature
# git checkout HEAD@{1} → back to "deleted" commit
Why rewriting history changes SHAs: A commit's SHA includes its parent SHA. Change any ancestor, and every descendant commit gets a new SHA.
The Bottom Line
Git is a content-addressable object store. Blobs store file contents. Trees store structure. Commits store snapshots and link to parents. Branches are 41-byte files.
The model:
- Objects are immutable, identified by hash
- Branches are pointers to commits, not collections of changes
- Merge creates a commit with multiple parents
- Rebase replays commits on a new base (new SHAs)
- Deleted commits live until garbage collection — reflog saves you
Once you see the object graph, git reset, git rebase, and git cherry-pick go from scary to mechanical.