Postgres Full-Text Search: Skip Elasticsearch for Most Apps

TL;DR

PostgreSQL full-text search uses tsvector and tsquery. Index with GIN indexes. Use ts_rank for relevance, websearch_to_tsquery for user input, and pg_trgm for fuzzy matching. Handles millions of records well with no extra infrastructure.

Our search was broken. ILIKE queries on a 2M row table took 8 seconds. We planned to add Elasticsearch. Then I spent a day reading the PostgreSQL docs instead.

Two GIN indexes and a tsvector column later: 50ms search, relevance ranking, highlighted snippets, and zero new infrastructure. Elasticsearch can wait until you actually need it.

The Problem with LIKE Queries

-- BAD: ILIKE can't use normal indexes, scans entire table
SELECT * FROM articles
WHERE title ILIKE '%postgresql%'
   OR content ILIKE '%postgresql%';

-- 8 seconds on 2M rows
-- Gets slower linearly with table size
-- No relevance ranking
-- No word stemming ('postgresql' won't match 'postgresq')

How PostgreSQL Full-Text Search Works

Two types power everything:

  • tsvector: Preprocessed document (normalized words + positions)
  • tsquery: Search query
-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- Stopwords removed, words stemmed

-- Convert search to tsquery
SELECT to_tsquery('english', 'jumping & dogs');
-- 'jump' & 'dog'  ← Both stemmed to match variants

-- Test if document matches query
SELECT to_tsvector('english', 'The dog is jumping') @@ to_tsquery('english', 'dog & jump');
-- true

The @@ operator checks if a tsvector matches a tsquery.

Setting Up Full-Text Search

-- Add a generated tsvector column (auto-updates on row changes)
ALTER TABLE articles ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(author, '')), 'C')
    ) STORED;

-- setweight assigns importance: A > B > C > D
-- title matches rank higher than body matches
-- STORED: computed and stored on write (fast reads)

-- Index it for fast search
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

Now search is fast:

SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

-- GIN index used, ~5ms for 2M rows

Handling User Input

Real users don't write boolean queries. Use websearch_to_tsquery:

-- Understands natural language input
SELECT websearch_to_tsquery('english', 'postgresql performance');
-- 'postgresql' & 'performance'

SELECT websearch_to_tsquery('english', '"exact phrase" OR alternative');
-- 'exact' <-> 'phrase' | 'alternative'

SELECT websearch_to_tsquery('english', 'postgresql -mysql');
-- 'postgresql' & !'mysql'
-- Production search query with snippets and pagination
SELECT
    id,
    title,
    ts_rank(search_vector, query) as relevance,
    ts_headline('english', body, query,
        'MaxWords=50, MinWords=20, StartSel=<mark>, StopSel=</mark>'
    ) as excerpt
FROM articles,
    websearch_to_tsquery('english', $1) query
WHERE search_vector @@ query
  AND published = true
ORDER BY relevance DESC
LIMIT $2 OFFSET $3;

ts_headline generates highlighted snippets with search terms wrapped in your chosen markup.

Relevance Ranking

-- ts_rank: rank by term frequency
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'database') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Combine relevance with recency
SELECT
    title,
    ts_rank(search_vector, query) * 0.7 +
    (CASE WHEN published_at > NOW() - INTERVAL '7 days' THEN 0.3 ELSE 0 END) as score
FROM articles,
    websearch_to_tsquery('english', 'database') query
WHERE search_vector @@ query
ORDER BY score DESC;

Fuzzy Search with pg_trgm

For typo tolerance ("postgresq" finding "postgresql"):

-- Enable trigram extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Index for fuzzy search
CREATE INDEX idx_articles_title_trgm ON articles USING GIN(title gin_trgm_ops);

-- Find similar titles (% means "similar to")
SELECT title, similarity(title, 'postgresq') as sim
FROM articles
WHERE title % 'postgresq'
ORDER BY sim DESC
LIMIT 10;

-- Combine full-text with fuzzy for best results
SELECT
    title,
    GREATEST(
        ts_rank(search_vector, fts_query),
        similarity(title, $1)
    ) as score
FROM articles
CROSS JOIN LATERAL websearch_to_tsquery('english', $1) as fts_query
WHERE search_vector @@ fts_query
   OR title % $1
ORDER BY score DESC
LIMIT 20;

Application Code

# Python
async def search(query: str, page: int = 1, per_page: int = 20):
    offset = (page - 1) * per_page

    rows = await db.fetch("""
        WITH results AS (
            SELECT
                id,
                title,
                ts_rank(search_vector, tsq) as relevance,
                ts_headline('english', body, tsq,
                    'MaxWords=30, MinWords=15, StartSel=<b>, StopSel=</b>'
                ) as excerpt,
                COUNT(*) OVER() as total_count
            FROM articles,
                websearch_to_tsquery('english', $1) tsq
            WHERE search_vector @@ tsq
              AND published = true
            ORDER BY relevance DESC
        )
        SELECT * FROM results
        LIMIT $2 OFFSET $3
    """, query, per_page, offset)

    return {
        "results": [dict(row) for row in rows],
        "total": rows[0]["total_count"] if rows else 0,
        "page": page,
    }
// Node.js
async function search(query, { page = 1, perPage = 20 } = {}) {
    const offset = (page - 1) * perPage;

    const { rows } = await pool.query(`
        SELECT
            id,
            title,
            ts_rank(search_vector, tsq) as relevance,
            ts_headline('english', body, tsq,
                'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>'
            ) as excerpt,
            COUNT(*) OVER() as total_count
        FROM articles,
            websearch_to_tsquery('english', $1) tsq
        WHERE search_vector @@ tsq
          AND published = true
        ORDER BY relevance DESC
        LIMIT $2 OFFSET $3
    `, [query, perPage, offset]);

    return {
        results: rows,
        total: rows[0]?.total_count ?? 0,
        page,
    };
}

Backfilling Existing Data

-- If you add the column to an existing table
UPDATE articles
SET search_vector = (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
);

-- For large tables, update in batches to avoid locking
DO $$
DECLARE
    rows_updated INT;
BEGIN
    LOOP
        UPDATE articles
        SET search_vector = (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(body, '')), 'B')
        )
        WHERE id IN (
            SELECT id FROM articles
            WHERE search_vector IS NULL
            LIMIT 1000
        );

        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;

        PERFORM pg_sleep(0.05);  -- Brief pause between batches
    END LOOP;
END $$;

When PostgreSQL Search Is Enough

Use PostgreSQL full-text search when:

  • Your data is already in PostgreSQL
  • You need search + relational filtering together (by date, user, category)
  • Dataset is under ~50M documents
  • You want zero operational complexity

Consider Elasticsearch when:

  • You need sub-millisecond search at massive scale
  • Complex analytics on search results
  • Real-time indexing of millions of documents per hour
  • You have a dedicated infrastructure team to maintain it

The Bottom Line

PostgreSQL full-text search handles most application search needs. No Elasticsearch, no extra servers, no sync jobs.

Key points:

  • Use a GENERATED ALWAYS AS tsvector column with a GIN index
  • Weight columns by importance with setweight (title > body > tags)
  • Use websearch_to_tsquery for user input — it handles natural language
  • Add pg_trgm for typo tolerance
  • Generate highlighted snippets with ts_headline
  • Combine with SQL WHERE clauses for faceted search (filter by category, date, author)

Before adding Elasticsearch, spend a day with the PostgreSQL docs. You probably don't need the extra complexity.