Logging in Production: What to Log, What to Skip

2026-01-31

TL;DR

Use structured logging with levels. Log requests, errors, and key events - not sensitive data or debug spam. JSON format for aggregation. Keep performance impact under 5%. Fix log leaks before they become breaches.

I spent three days debugging a production issue. Logs showed nothing. Then I found the problem - someone had set the log level to ERROR to "improve performance." We were blind to everything happening in production.

Two weeks later, our security team found customer credit cards in our logs. A developer had added console.log(req.body) while debugging and forgot to remove it. Logs were being shipped to five different systems. The breach notification cost us six figures.

Logging is either an afterthought or a firehose. Here's what I've learned about production logging from managing systems processing billions of requests.

The Console.log Problem

// Every codebase I've inherited
console.log('user:', user);
console.log('processing payment...');
console.log('data:', JSON.stringify(data));
console.log('HERE!!!'); // debugging from 2 years ago
console.log('wtf why is this broken');
console.log(req.body); // security nightmare

Problems:

No context (timestamp, level, request ID)
Can't filter or search effectively
Logs everything including secrets
No way to disable in production
Terrible performance at scale
Unstructured - can't parse or aggregate

I've seen production systems with 10,000 lines of logs per second, all console.log. Finding anything is impossible.

Structured Logging: The Right Way

// Use a proper logging library
const winston = require('winston');

const logger = winston.createLogger({
    level: process.env.LOG_LEVEL || 'info',
    format: winston.format.combine(
        winston.format.timestamp(),
        winston.format.errors({ stack: true }),
        winston.format.json()
    ),
    transports: [
        new winston.transports.Console(),
        new winston.transports.File({ filename: 'error.log', level: 'error' }),
        new winston.transports.File({ filename: 'combined.log' })
    ]
});

// Now your logs are structured
logger.info('User logged in', {
    userId: user.id,
    ip: req.ip,
    userAgent: req.headers['user-agent']
});

// Output:
// {
//   "level": "info",
//   "message": "User logged in",
//   "timestamp": "2026-01-31T10:30:00.000Z",
//   "userId": "usr_123",
//   "ip": "203.0.113.1",
//   "userAgent": "Mozilla/5.0..."
// }

Structured logs can be:

Filtered by level
Searched by field
Aggregated and analyzed
Correlated across services
Alerted on automatically

Log Levels: What They Actually Mean

I see these used wrong constantly:

// ERROR - Something broke, needs immediate attention
logger.error('Database connection failed', {
    error: err.message,
    host: dbHost,
    retryCount: 3
});

// WARN - Something's wrong but we're handling it
logger.warn('Rate limit exceeded', {
    userId: user.id,
    endpoint: req.path,
    limit: 100
});

// INFO - Normal business events worth recording
logger.info('Payment processed', {
    orderId: order.id,
    amount: order.total,
    paymentMethod: 'stripe'
});

// DEBUG - Detailed info for troubleshooting (off in production)
logger.debug('Cache lookup', {
    key: cacheKey,
    hit: cacheHit,
    ttl: ttl
});

// TRACE - Super detailed (almost never needed)
logger.trace('Function entered', {
    function: 'processOrder',
    args: args
});

My production setup:

ERROR: Wake me up at 3am
WARN: Look at tomorrow morning
INFO: Normal operations, enabled always
DEBUG: Disabled in production, enable for troubleshooting
TRACE: Never used

What to Actually Log

1. HTTP Requests (With Limits)

// Log every request (summarized)
app.use((req, res, next) => {
    const start = Date.now();

    res.on('finish', () => {
        const duration = Date.now() - start;

        logger.info('HTTP request', {
            method: req.method,
            path: req.path,
            status: res.statusCode,
            duration: duration,
            ip: req.ip,
            requestId: req.id,
            // Don't log query params - might contain tokens
            // Don't log body - might contain passwords
            // Don't log headers - might contain auth tokens
        });
    });

    next();
});

Don't log:

Full request body (passwords, credit cards)
Query parameters (tokens, API keys)
Authorization headers
Cookies with session data

2. Errors (With Context)

// BAD - No context
logger.error('Database error');

// GOOD - Actionable information
try {
    await db.query(sql, params);
} catch (err) {
    logger.error('Database query failed', {
        error: err.message,
        stack: err.stack,
        query: sql.substring(0, 100), // Truncate long queries
        userId: req.user?.id,
        requestId: req.id,
        // Don't log params - might contain PII
    });
    throw err;
}

Always include:

Error message and stack trace
Request ID for correlation
User ID (if authenticated)
What operation failed
Relevant context (not sensitive data)

3. Business Events

// Track important business operations
logger.info('User registration', {
    userId: user.id,
    source: 'web',
    plan: 'free'
});

logger.info('Subscription upgraded', {
    userId: user.id,
    fromPlan: 'free',
    toPlan: 'pro',
    revenue: 29.99
});

logger.info('Payment failed', {
    userId: user.id,
    orderId: order.id,
    amount: order.total,
    reason: 'insufficient_funds'
});

These logs become your analytics data source.

4. External API Calls

// Log external service interactions
async function callStripe(endpoint, data) {
    const start = Date.now();

    try {
        const response = await stripe.post(endpoint, data);

        logger.info('Stripe API call succeeded', {
            endpoint: endpoint,
            duration: Date.now() - start,
            statusCode: response.status,
            requestId: response.headers['request-id']
        });

        return response.data;
    } catch (err) {
        logger.error('Stripe API call failed', {
            endpoint: endpoint,
            duration: Date.now() - start,
            error: err.message,
            statusCode: err.response?.status,
            stripeCode: err.code
        });
        throw err;
    }
}

This helps debug third-party integration issues.

What NOT to Log

Security-Sensitive Data

// NEVER LOG THESE
logger.info('User login', {
    email: user.email,
    password: password,              // ❌ NEVER
    creditCard: req.body.cardNumber, // ❌ NEVER
    ssn: user.ssn,                   // ❌ NEVER
    apiKey: req.headers.authorization // ❌ NEVER
});

// Instead
logger.info('User login', {
    userId: user.id,
    // Email is borderline - depends on your threat model
    email: user.email,
    ipAddress: req.ip
});

Never log:

Passwords (plain or hashed)
Credit card numbers
Social security numbers
API keys or tokens
Session IDs
Private keys or certificates
Unmasked PII

I've seen all of these in production logs. Don't be that person.

High-Volume Noise

// BAD - Logs every cache hit (10,000/sec)
function getFromCache(key) {
    const value = cache.get(key);
    logger.debug('Cache lookup', { key, hit: !!value }); // Too much!
    return value;
}

// GOOD - Only log cache issues
function getFromCache(key) {
    const value = cache.get(key);

    // Only log misses for important keys
    if (!value && key.startsWith('critical:')) {
        logger.warn('Cache miss for critical key', { key });
    }

    return value;
}

Don't log:

Every database query (log slow queries)
Every cache hit (log misses for important data)
Every function call (use profilers instead)
Loop iterations
Successful health checks

Full Objects

// BAD - Giant object in every log
logger.info('Processing order', { order: order }); // 500 fields

// GOOD - Only relevant fields
logger.info('Processing order', {
    orderId: order.id,
    userId: order.userId,
    itemCount: order.items.length,
    total: order.total
});

Logging entire objects:

Bloats log storage
Slows down logging
Exposes sensitive data you didn't realize was there
Makes logs hard to read

Request IDs: The Secret Weapon

const { v4: uuidv4 } = require('uuid');

// Add request ID to every request
app.use((req, res, next) => {
    req.id = req.headers['x-request-id'] || uuidv4();
    res.setHeader('X-Request-ID', req.id);
    next();
});

// Include in every log
logger.info('Processing request', {
    requestId: req.id,
    path: req.path
});

logger.error('Database error', {
    requestId: req.id,
    error: err.message
});

Now you can trace a single request across:

Multiple log entries
Multiple services
External API calls
Database queries

Pro tip: Return the request ID in error responses:

app.use((err, req, res, next) => {
    logger.error('Request failed', {
        requestId: req.id,
        error: err.message,
        stack: err.stack
    });

    res.status(500).json({
        error: 'Internal server error',
        requestId: req.id // Users can give you this for debugging
    });
});

Performance: Logging Is Not Free

I once had an app that spent 30% of CPU time on logging. Here's what kills performance:

Synchronous Logging

// BAD - Blocks the event loop
function syncLogToFile(message) {
    fs.appendFileSync('app.log', message + '\n'); // Blocks!
}

// GOOD - Async logging
const logger = winston.createLogger({
    transports: [
        new winston.transports.File({
            filename: 'app.log',
            // Winston writes async by default
        })
    ]
});

Excessive String Concatenation

// BAD - Concatenates even if log level is disabled
logger.debug('User data: ' + JSON.stringify(hugeObject));
// If debug is disabled, you still paid for JSON.stringify!

// GOOD - Lazy evaluation
if (logger.level === 'debug') {
    logger.debug('User data', { user: hugeObject });
}

// BETTER - Logger handles this
logger.debug('User data', () => ({ user: hugeObject }));

Stack Trace Capture

// BAD - Captures stack trace for every log
logger.info('Request processed', {
    stack: new Error().stack // Expensive!
});

// GOOD - Only capture for errors
logger.error('Request failed', {
    error: err.message,
    stack: err.stack // Already captured
});

My Performance Rules

Keep logging under 5% of total CPU
Buffer logs and flush async
Sample high-frequency logs
Disable debug logs in production
Monitor logging overhead

Sampling: Handle High Volume

// Log 1% of successful requests, 100% of errors
app.use((req, res, next) => {
    res.on('finish', () => {
        const shouldLog = res.statusCode >= 400 || Math.random() < 0.01;

        if (shouldLog) {
            logger.info('HTTP request', {
                method: req.method,
                path: req.path,
                status: res.statusCode,
                duration: res.duration,
                sampled: res.statusCode < 400 // Mark sampled logs
            });
        }
    });

    next();
});

This reduced our log volume by 95% while keeping visibility into problems.

Real-World Configurations

Node.js with Winston

const winston = require('winston');

const logger = winston.createLogger({
    level: process.env.LOG_LEVEL || 'info',
    format: winston.format.combine(
        winston.format.timestamp(),
        winston.format.errors({ stack: true }),
        winston.format.json()
    ),
    defaultMeta: {
        service: 'api',
        version: process.env.APP_VERSION,
        environment: process.env.NODE_ENV
    },
    transports: [
        // Console for local dev
        new winston.transports.Console({
            format: winston.format.combine(
                winston.format.colorize(),
                winston.format.simple()
            )
        }),

        // File for production
        new winston.transports.File({
            filename: 'error.log',
            level: 'error',
            maxsize: 10485760, // 10MB
            maxFiles: 5
        })
    ]
});

module.exports = logger;

Python with structlog

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.StackInfoRenderer(),
        structlog.processors.format_exc_info,
        structlog.processors.JSONRenderer()
    ],
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()

# Usage
logger.info("user_login", user_id=user.id, ip=request.ip)
logger.error("db_error", error=str(e), query=sql[:100])

Go with zap

package main

import "go.uber.org/zap"

func main() {
    logger, _ := zap.NewProduction()
    defer logger.Sync()

    logger.Info("user_login",
        zap.String("user_id", userID),
        zap.String("ip", req.RemoteAddr),
    )

    logger.Error("db_error",
        zap.Error(err),
        zap.String("query", sql),
    )
}

Log Aggregation: Making Logs Useful

Structured logs are pointless if you can't search them:

CloudWatch Logs

const CloudWatchTransport = require('winston-cloudwatch');

logger.add(new CloudWatchTransport({
    logGroupName: '/aws/lambda/my-function',
    logStreamName: () => {
        const date = new Date().toISOString().split('T')[0];
        return `${date}-${process.env.AWS_LAMBDA_LOG_STREAM_NAME}`;
    },
    awsRegion: 'us-east-1'
}));

Search queries:

# Find all errors for a user
fields @timestamp, @message, error
| filter userId = "usr_123" and level = "error"
| sort @timestamp desc

# Slow requests
fields @timestamp, method, path, duration
| filter duration > 1000
| stats avg(duration) by path

ELK Stack (Elasticsearch, Logstash, Kibana)

// Ship logs to Elasticsearch
const ElasticsearchTransport = require('winston-elasticsearch');

logger.add(new ElasticsearchTransport({
    level: 'info',
    clientOpts: {
        node: 'http://localhost:9200'
    },
    index: 'logs'
}));

Datadog

const datadog = require('winston-datadog');

logger.add(new datadog({
    apiKey: process.env.DATADOG_API_KEY,
    hostname: 'api-server',
    service: 'web-api',
    ddsource: 'nodejs'
}));

Context: The Middleware Pattern

// Add context to all logs in a request
const asyncLocalStorage = new AsyncLocalStorage();

app.use((req, res, next) => {
    const context = {
        requestId: req.id,
        userId: req.user?.id,
        ip: req.ip,
        path: req.path
    };

    asyncLocalStorage.run(context, () => next());
});

// Helper to get context
function getLogContext() {
    return asyncLocalStorage.getStore() || {};
}

// Now every log automatically includes context
function logWithContext(level, message, extra) {
    logger[level](message, {
        ...getLogContext(),
        ...extra
    });
}

// Usage anywhere in request
logWithContext('info', 'Payment processed', { orderId: order.id });
// Automatically includes requestId, userId, ip, path

Alerts: When to Wake Someone Up

// Example alert rules
const ALERT_RULES = {
    // Error rate above 1%
    high_error_rate: {
        query: 'status >= 500',
        threshold: 0.01,
        window: '5m',
        severity: 'critical'
    },

    // Slow requests
    slow_requests: {
        query: 'duration > 5000',
        threshold: 0.05, // 5% of requests
        window: '10m',
        severity: 'warning'
    },

    // Payment failures
    payment_failures: {
        query: 'event = "payment_failed"',
        threshold: 10, // 10 in 5 minutes
        window: '5m',
        severity: 'critical'
    }
};

Alert fatigue is real. I only alert on:

Error rate above baseline
External API failures
Payment processing issues
Database connection failures
Disk/memory approaching limits

Don't alert on single errors or expected failures.

Common Mistakes

Mistake 1: Logging Passwords

// BAD
app.post('/login', (req, res) => {
    logger.info('Login attempt', { body: req.body });
    // Just logged the password!
});

// GOOD
app.post('/login', (req, res) => {
    logger.info('Login attempt', {
        email: req.body.email,
        ip: req.ip
    });
});

Mistake 2: Logging in Loops

// BAD - 10,000 log entries
for (const item of items) {
    logger.info('Processing item', { id: item.id });
    processItem(item);
}

// GOOD - One log entry
logger.info('Processing batch', { count: items.length });
for (const item of items) {
    processItem(item);
}
logger.info('Batch complete', { count: items.length });

Mistake 3: No Error Context

// BAD
try {
    await processPayment(order);
} catch (err) {
    logger.error(err.message); // No context!
}

// GOOD
try {
    await processPayment(order);
} catch (err) {
    logger.error('Payment processing failed', {
        error: err.message,
        stack: err.stack,
        orderId: order.id,
        userId: order.userId,
        amount: order.total,
        paymentMethod: order.paymentMethod
    });
}

Mistake 4: Logging Too Much in Production

// BAD - Debug logs in production
logger.debug('Cache lookup', { key }); // 10,000/sec
logger.debug('Database query', { sql }); // 5,000/sec
logger.debug('Processing item', { item }); // 50,000/sec

// GOOD - Info only in production
if (process.env.NODE_ENV === 'production') {
    logger.level = 'info';
}

Mistake 5: No Log Rotation

# Without rotation, logs fill disk
-rw-r--r-- 1 node node 45G Jan 31 10:00 app.log

# With rotation
-rw-r--r-- 1 node node 100M Jan 31 10:00 app.log
-rw-r--r-- 1 node node 100M Jan 30 10:00 app.log.1
-rw-r--r-- 1 node node 100M Jan 29 10:00 app.log.2

Use logrotate or Winston's maxsize/maxFiles.

Security: Preventing Log Injection

// BAD - User input in logs
logger.info(`User ${req.body.username} logged in`);
// If username is: "admin\n{\"level\":\"error\",\"message\":\"System hacked\"}"
// Creates fake log entries!

// GOOD - Structured logging prevents injection
logger.info('User logged in', {
    username: req.body.username // Safely escaped in JSON
});

Structured logging (JSON) prevents log injection attacks.

My Current Setup

After running production systems for years:

const winston = require('winston');
const asyncLocalStorage = new AsyncLocalStorage();

// Create logger
const logger = winston.createLogger({
    level: process.env.LOG_LEVEL || 'info',
    format: winston.format.combine(
        winston.format.timestamp(),
        winston.format.errors({ stack: true }),
        winston.format.json()
    ),
    defaultMeta: {
        service: process.env.SERVICE_NAME,
        version: process.env.VERSION,
        env: process.env.NODE_ENV
    },
    transports: [
        new winston.transports.Console(),
        new winston.transports.File({
            filename: 'error.log',
            level: 'error',
            maxsize: 10485760,
            maxFiles: 5
        })
    ]
});

// Request middleware
app.use((req, res, next) => {
    req.id = req.headers['x-request-id'] || uuidv4();
    res.setHeader('X-Request-ID', req.id);

    const context = {
        requestId: req.id,
        userId: req.user?.id,
        ip: req.ip
    };

    asyncLocalStorage.run(context, () => next());
});

// Log wrapper with context
const log = {
    info: (message, meta = {}) => {
        logger.info(message, { ...asyncLocalStorage.getStore(), ...meta });
    },
    warn: (message, meta = {}) => {
        logger.warn(message, { ...asyncLocalStorage.getStore(), ...meta });
    },
    error: (message, meta = {}) => {
        logger.error(message, { ...asyncLocalStorage.getStore(), ...meta });
    }
};

module.exports = log;

What I log:

HTTP requests (status, duration, path)
Errors (with full context)
Business events (signups, payments, etc.)
External API calls (success/failure, duration)
Slow operations (> 1 second)

What I don't log:

Passwords or tokens
Full request/response bodies
Debug info in production
High-frequency events (cache hits)
PII without explicit need

The Bottom Line

Good logging makes debugging possible. Bad logging makes breaches inevitable.

Use structured logging. JSON format with proper libraries, not console.log.

Log with context. Request IDs, user IDs, relevant business data - not sensitive info.

Respect log levels. ERROR means wake someone up. INFO means normal operations. DEBUG stays off in production.

Never log secrets. Passwords, tokens, credit cards, API keys - assume logs are public.

Monitor performance. Logging shouldn't consume more than 5% of your resources.

I've debugged production issues blind because of missing logs. I've responded to security breaches because of exposed logs.

Set up proper logging from day one. The first production incident will prove it was worth it.