How Git Actually Works Internally

TL;DR

Git stores everything as objects (blobs, trees, commits, tags) identified by SHA-1 hash. Branches are just files containing a commit hash. Merges combine DAG nodes. Once you see the object model, git becomes predictable.

I used git for two years before I understood what it was actually doing. I knew the commands but couldn't reason about them. Then I read the .git directory and it clicked.

Git is a content-addressable filesystem with a version control interface on top.

The Object Store

Everything in git is an object stored in .git/objects/. There are four types:

blob   — file contents
tree   — directory listing (pointers to blobs and other trees)
commit — snapshot + metadata (author, message, parent commits, pointer to root tree)
tag    — annotated tag (pointer to a commit + tag metadata)

Each object is identified by the SHA-1 hash of its contents. The hash is the address.

# See for yourself
ls .git/objects/
# ab/cd1234...
# ef/gh5678...
# pack/   ← packed objects (git gc packs loose objects)

# Read any object
git cat-file -t ab/cd1234...  # type: blob, tree, commit, or tag
git cat-file -p ab/cd1234...  # pretty-print contents

Blobs: File Contents

A blob stores raw file contents. Nothing else — no filename, no permissions.

# Hash any content
echo "hello world" | git hash-object --stdin
# 8c7e5a667f1b771847fe88c01c3de34413a1b220

# The SHA changes if content changes by even one byte
echo "hello world!" | git hash-object --stdin
# 7df4eaed87ab4a5c7be82c49ceec3e2b7c82a3a4

# Two files with identical contents share the same blob
# Git stores it once — free deduplication

Trees: Directory Structure

Trees map names and permissions to blobs and sub-trees:

git cat-file -p HEAD^{tree}
# 100644 blob a5c45e...  .gitignore
# 100644 blob 3b18e5...  README.md
# 040000 tree 7f4c9b...  src
# 100755 blob 89ab12...  build.sh
#         ^^^^           ^^^^^^^^^^
#  permissions           SHA of blob/tree

The tree object for src/ points to the blobs inside it. The commit's root tree points to all of these. Nothing is stored by path — it's all nested structure.

Commits: Snapshots

A commit is a snapshot, not a diff.

git cat-file -p HEAD
# tree 7f4c9b...        ← root tree for this snapshot
# parent a1b2c3...      ← previous commit (can have 2 for merges, 0 for root)
# author Alice <a@x.com> 1743638400 +0000
# committer Alice <a@x.com> 1743638400 +0000
#
# Add new feature

This is why git diff between two commits computes the diff on the fly — git stores complete snapshots, not deltas. (Under the hood, pack files use delta compression, but the object model is snapshots.)

Branches Are Just Files

cat .git/refs/heads/main
# a1b2c3d4e5f6...  ← just a commit SHA

cat .git/HEAD
# ref: refs/heads/main   ← what branch you're on

# Creating a branch:
git branch feature
# Creates: .git/refs/heads/feature containing current HEAD commit SHA

# That's it. A branch is a 41-byte file.

When you commit, git:

  1. Creates blob objects for changed files
  2. Creates new tree objects for affected directories
  3. Creates a commit object pointing to the new root tree, with HEAD as parent
  4. Updates .git/refs/heads/main to point to the new commit
Before:                     After:
HEAD → main → C1            HEAD → main → C2 → C1

The DAG (Directed Acyclic Graph)

Commits form a directed acyclic graph. Each commit points to its parent(s). The graph only goes backward — you can always trace history, never go forward.

A ← B ← C ← D  (main)
         ↑
         E ← F  (feature)

git merge feature:
A ← B ← C ← D ← M  (main)
         ↑       ↑
         E ← F ──┘

M is a merge commit with two parents: D and F

Fast-forward merge (no divergence):

A ← B ← C  (main)
              ↑
              D ← E  (feature)

git merge feature:
A ← B ← C ← D ← E  (main = feature, no merge commit needed)

Git just moves the branch pointer forward. That's why it's called fast-forward.

Rebase: Replaying Commits

Before rebase:
main:    A ← B ← C
feature:         ↑
                 D ← E

git checkout feature
git rebase main

After rebase:
main:    A ← B ← C
feature:          ↑
                  D' ← E'

D' and E' are new commits with the same changes as D and E,
but with C as their parent instead of B.

Rebase rewrites history — D' has a different SHA than D, even if the file changes are identical, because the parent hash changed. This is why you never rebase shared branches.

The Index (Staging Area)

.git/index is the index — a binary file representing the next commit.

# View index contents
git ls-files --stage
# 100644 a5c45e... 0  .gitignore
# 100644 3b18e5... 0  README.md

# git add updates the index:
git add file.txt
# Writes file.txt blob to object store, updates index to point to it

# git commit reads the index:
# Creates tree from index, creates commit pointing to that tree

git diff (no args) shows unstaged changes — working tree vs index. git diff --cached shows staged changes — index vs last commit.

Tags Are Permanent Commit Labels

# Lightweight tag: just a ref (like a branch that doesn't move)
cat .git/refs/tags/v1.0
# a1b2c3...  ← commit SHA

# Annotated tag: a full object with metadata
git cat-file -p v1.0-annotated
# object a1b2c3...
# type commit
# tag v1.0
# tagger Alice <a@x.com> 1743638400 +0000
#
# Release 1.0

Annotated tags are signed (with -s) and have their own SHA. They're the correct way to tag releases.

What This Explains

Why SHA collisions are catastrophic: If two different commits produce the same SHA, git can't tell them apart. This is the SHA-1 collision attack — git now supports SHA-256 objects to mitigate it.

Why git stash works like it does: Stash creates a commit (or two), stores them in refs/stash, restores the index to HEAD. git stash pop cherry-picks those commits back.

Why reflog saves you: Every time HEAD moves, git appends to .git/logs/HEAD. Objects that are no longer reachable from any ref aren't deleted immediately — they're garbage collected after a grace period. git reflog lets you find them by hash.

git reflog
# HEAD@{0}: reset: moving to HEAD~1
# HEAD@{1}: commit: accidentally deleted feature
# git checkout HEAD@{1}  → back to "deleted" commit

Why rewriting history changes SHAs: A commit's SHA includes its parent SHA. Change any ancestor, and every descendant commit gets a new SHA.

The Bottom Line

Git is a content-addressable object store. Blobs store file contents. Trees store structure. Commits store snapshots and link to parents. Branches are 41-byte files.

The model:

  • Objects are immutable, identified by hash
  • Branches are pointers to commits, not collections of changes
  • Merge creates a commit with multiple parents
  • Rebase replays commits on a new base (new SHAs)
  • Deleted commits live until garbage collection — reflog saves you

Once you see the object graph, git reset, git rebase, and git cherry-pick go from scary to mechanical.