Postgres Full-Text Search: Skip Elasticsearch for Most Apps
TL;DR
PostgreSQL full-text search uses tsvector and tsquery. Index with GIN indexes. Use ts_rank for relevance, websearch_to_tsquery for user input, and pg_trgm for fuzzy matching. Handles millions of records well with no extra infrastructure.
Our search was broken. ILIKE queries on a 2M row table took 8 seconds. We planned to add Elasticsearch. Then I spent a day reading the PostgreSQL docs instead.
Two GIN indexes and a tsvector column later: 50ms search, relevance ranking, highlighted snippets, and zero new infrastructure. Elasticsearch can wait until you actually need it.
The Problem with LIKE Queries
-- BAD: ILIKE can't use normal indexes, scans entire table
SELECT * FROM articles
WHERE title ILIKE '%postgresql%'
OR content ILIKE '%postgresql%';
-- 8 seconds on 2M rows
-- Gets slower linearly with table size
-- No relevance ranking
-- No word stemming ('postgresql' won't match 'postgresq')
How PostgreSQL Full-Text Search Works
Two types power everything:
- tsvector: Preprocessed document (normalized words + positions)
- tsquery: Search query
-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- Stopwords removed, words stemmed
-- Convert search to tsquery
SELECT to_tsquery('english', 'jumping & dogs');
-- 'jump' & 'dog' ← Both stemmed to match variants
-- Test if document matches query
SELECT to_tsvector('english', 'The dog is jumping') @@ to_tsquery('english', 'dog & jump');
-- true
The @@ operator checks if a tsvector matches a tsquery.
Setting Up Full-Text Search
-- Add a generated tsvector column (auto-updates on row changes)
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B') ||
setweight(to_tsvector('english', coalesce(author, '')), 'C')
) STORED;
-- setweight assigns importance: A > B > C > D
-- title matches rank higher than body matches
-- STORED: computed and stored on write (fast reads)
-- Index it for fast search
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
Now search is fast:
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
-- GIN index used, ~5ms for 2M rows
Handling User Input
Real users don't write boolean queries. Use websearch_to_tsquery:
-- Understands natural language input
SELECT websearch_to_tsquery('english', 'postgresql performance');
-- 'postgresql' & 'performance'
SELECT websearch_to_tsquery('english', '"exact phrase" OR alternative');
-- 'exact' <-> 'phrase' | 'alternative'
SELECT websearch_to_tsquery('english', 'postgresql -mysql');
-- 'postgresql' & !'mysql'
-- Production search query with snippets and pagination
SELECT
id,
title,
ts_rank(search_vector, query) as relevance,
ts_headline('english', body, query,
'MaxWords=50, MinWords=20, StartSel=<mark>, StopSel=</mark>'
) as excerpt
FROM articles,
websearch_to_tsquery('english', $1) query
WHERE search_vector @@ query
AND published = true
ORDER BY relevance DESC
LIMIT $2 OFFSET $3;
ts_headline generates highlighted snippets with search terms wrapped in your chosen markup.
Relevance Ranking
-- ts_rank: rank by term frequency
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'database') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Combine relevance with recency
SELECT
title,
ts_rank(search_vector, query) * 0.7 +
(CASE WHEN published_at > NOW() - INTERVAL '7 days' THEN 0.3 ELSE 0 END) as score
FROM articles,
websearch_to_tsquery('english', 'database') query
WHERE search_vector @@ query
ORDER BY score DESC;
Fuzzy Search with pg_trgm
For typo tolerance ("postgresq" finding "postgresql"):
-- Enable trigram extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Index for fuzzy search
CREATE INDEX idx_articles_title_trgm ON articles USING GIN(title gin_trgm_ops);
-- Find similar titles (% means "similar to")
SELECT title, similarity(title, 'postgresq') as sim
FROM articles
WHERE title % 'postgresq'
ORDER BY sim DESC
LIMIT 10;
-- Combine full-text with fuzzy for best results
SELECT
title,
GREATEST(
ts_rank(search_vector, fts_query),
similarity(title, $1)
) as score
FROM articles
CROSS JOIN LATERAL websearch_to_tsquery('english', $1) as fts_query
WHERE search_vector @@ fts_query
OR title % $1
ORDER BY score DESC
LIMIT 20;
Application Code
# Python
async def search(query: str, page: int = 1, per_page: int = 20):
offset = (page - 1) * per_page
rows = await db.fetch("""
WITH results AS (
SELECT
id,
title,
ts_rank(search_vector, tsq) as relevance,
ts_headline('english', body, tsq,
'MaxWords=30, MinWords=15, StartSel=<b>, StopSel=</b>'
) as excerpt,
COUNT(*) OVER() as total_count
FROM articles,
websearch_to_tsquery('english', $1) tsq
WHERE search_vector @@ tsq
AND published = true
ORDER BY relevance DESC
)
SELECT * FROM results
LIMIT $2 OFFSET $3
""", query, per_page, offset)
return {
"results": [dict(row) for row in rows],
"total": rows[0]["total_count"] if rows else 0,
"page": page,
}
// Node.js
async function search(query, { page = 1, perPage = 20 } = {}) {
const offset = (page - 1) * perPage;
const { rows } = await pool.query(`
SELECT
id,
title,
ts_rank(search_vector, tsq) as relevance,
ts_headline('english', body, tsq,
'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>'
) as excerpt,
COUNT(*) OVER() as total_count
FROM articles,
websearch_to_tsquery('english', $1) tsq
WHERE search_vector @@ tsq
AND published = true
ORDER BY relevance DESC
LIMIT $2 OFFSET $3
`, [query, perPage, offset]);
return {
results: rows,
total: rows[0]?.total_count ?? 0,
page,
};
}
Backfilling Existing Data
-- If you add the column to an existing table
UPDATE articles
SET search_vector = (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
);
-- For large tables, update in batches to avoid locking
DO $$
DECLARE
rows_updated INT;
BEGIN
LOOP
UPDATE articles
SET search_vector = (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
)
WHERE id IN (
SELECT id FROM articles
WHERE search_vector IS NULL
LIMIT 1000
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
PERFORM pg_sleep(0.05); -- Brief pause between batches
END LOOP;
END $$;
When PostgreSQL Search Is Enough
Use PostgreSQL full-text search when:
- Your data is already in PostgreSQL
- You need search + relational filtering together (by date, user, category)
- Dataset is under ~50M documents
- You want zero operational complexity
Consider Elasticsearch when:
- You need sub-millisecond search at massive scale
- Complex analytics on search results
- Real-time indexing of millions of documents per hour
- You have a dedicated infrastructure team to maintain it
The Bottom Line
PostgreSQL full-text search handles most application search needs. No Elasticsearch, no extra servers, no sync jobs.
Key points:
- Use a
GENERATED ALWAYS AStsvector column with a GIN index - Weight columns by importance with
setweight(title > body > tags) - Use
websearch_to_tsqueryfor user input — it handles natural language - Add
pg_trgmfor typo tolerance - Generate highlighted snippets with
ts_headline - Combine with SQL WHERE clauses for faceted search (filter by category, date, author)
Before adding Elasticsearch, spend a day with the PostgreSQL docs. You probably don't need the extra complexity.