Logging in Production: What to Log, What to Skip
TL;DR
Use structured logging with levels. Log requests, errors, and key events - not sensitive data or debug spam. JSON format for aggregation. Keep performance impact under 5%. Fix log leaks before they become breaches.
I spent three days debugging a production issue. Logs showed nothing. Then I found the problem - someone had set the log level to ERROR to "improve performance." We were blind to everything happening in production.
Two weeks later, our security team found customer credit cards in our logs. A developer had added console.log(req.body) while debugging and forgot to remove it. Logs were being shipped to five different systems. The breach notification cost us six figures.
Logging is either an afterthought or a firehose. Here's what I've learned about production logging from managing systems processing billions of requests.
The Console.log Problem
// Every codebase I've inherited
console.log('user:', user);
console.log('processing payment...');
console.log('data:', JSON.stringify(data));
console.log('HERE!!!'); // debugging from 2 years ago
console.log('wtf why is this broken');
console.log(req.body); // security nightmare
Problems:
- No context (timestamp, level, request ID)
- Can't filter or search effectively
- Logs everything including secrets
- No way to disable in production
- Terrible performance at scale
- Unstructured - can't parse or aggregate
I've seen production systems with 10,000 lines of logs per second, all console.log. Finding anything is impossible.
Structured Logging: The Right Way
// Use a proper logging library
const winston = require('winston');
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
transports: [
new winston.transports.Console(),
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
});
// Now your logs are structured
logger.info('User logged in', {
userId: user.id,
ip: req.ip,
userAgent: req.headers['user-agent']
});
// Output:
// {
// "level": "info",
// "message": "User logged in",
// "timestamp": "2026-01-31T10:30:00.000Z",
// "userId": "usr_123",
// "ip": "203.0.113.1",
// "userAgent": "Mozilla/5.0..."
// }
Structured logs can be:
- Filtered by level
- Searched by field
- Aggregated and analyzed
- Correlated across services
- Alerted on automatically
Log Levels: What They Actually Mean
I see these used wrong constantly:
// ERROR - Something broke, needs immediate attention
logger.error('Database connection failed', {
error: err.message,
host: dbHost,
retryCount: 3
});
// WARN - Something's wrong but we're handling it
logger.warn('Rate limit exceeded', {
userId: user.id,
endpoint: req.path,
limit: 100
});
// INFO - Normal business events worth recording
logger.info('Payment processed', {
orderId: order.id,
amount: order.total,
paymentMethod: 'stripe'
});
// DEBUG - Detailed info for troubleshooting (off in production)
logger.debug('Cache lookup', {
key: cacheKey,
hit: cacheHit,
ttl: ttl
});
// TRACE - Super detailed (almost never needed)
logger.trace('Function entered', {
function: 'processOrder',
args: args
});
My production setup:
- ERROR: Wake me up at 3am
- WARN: Look at tomorrow morning
- INFO: Normal operations, enabled always
- DEBUG: Disabled in production, enable for troubleshooting
- TRACE: Never used
What to Actually Log
1. HTTP Requests (With Limits)
// Log every request (summarized)
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
logger.info('HTTP request', {
method: req.method,
path: req.path,
status: res.statusCode,
duration: duration,
ip: req.ip,
requestId: req.id,
// Don't log query params - might contain tokens
// Don't log body - might contain passwords
// Don't log headers - might contain auth tokens
});
});
next();
});
Don't log:
- Full request body (passwords, credit cards)
- Query parameters (tokens, API keys)
- Authorization headers
- Cookies with session data
2. Errors (With Context)
// BAD - No context
logger.error('Database error');
// GOOD - Actionable information
try {
await db.query(sql, params);
} catch (err) {
logger.error('Database query failed', {
error: err.message,
stack: err.stack,
query: sql.substring(0, 100), // Truncate long queries
userId: req.user?.id,
requestId: req.id,
// Don't log params - might contain PII
});
throw err;
}
Always include:
- Error message and stack trace
- Request ID for correlation
- User ID (if authenticated)
- What operation failed
- Relevant context (not sensitive data)
3. Business Events
// Track important business operations
logger.info('User registration', {
userId: user.id,
source: 'web',
plan: 'free'
});
logger.info('Subscription upgraded', {
userId: user.id,
fromPlan: 'free',
toPlan: 'pro',
revenue: 29.99
});
logger.info('Payment failed', {
userId: user.id,
orderId: order.id,
amount: order.total,
reason: 'insufficient_funds'
});
These logs become your analytics data source.
4. External API Calls
// Log external service interactions
async function callStripe(endpoint, data) {
const start = Date.now();
try {
const response = await stripe.post(endpoint, data);
logger.info('Stripe API call succeeded', {
endpoint: endpoint,
duration: Date.now() - start,
statusCode: response.status,
requestId: response.headers['request-id']
});
return response.data;
} catch (err) {
logger.error('Stripe API call failed', {
endpoint: endpoint,
duration: Date.now() - start,
error: err.message,
statusCode: err.response?.status,
stripeCode: err.code
});
throw err;
}
}
This helps debug third-party integration issues.
What NOT to Log
Security-Sensitive Data
// NEVER LOG THESE
logger.info('User login', {
email: user.email,
password: password, // ❌ NEVER
creditCard: req.body.cardNumber, // ❌ NEVER
ssn: user.ssn, // ❌ NEVER
apiKey: req.headers.authorization // ❌ NEVER
});
// Instead
logger.info('User login', {
userId: user.id,
// Email is borderline - depends on your threat model
email: user.email,
ipAddress: req.ip
});
Never log:
- Passwords (plain or hashed)
- Credit card numbers
- Social security numbers
- API keys or tokens
- Session IDs
- Private keys or certificates
- Unmasked PII
I've seen all of these in production logs. Don't be that person.
High-Volume Noise
// BAD - Logs every cache hit (10,000/sec)
function getFromCache(key) {
const value = cache.get(key);
logger.debug('Cache lookup', { key, hit: !!value }); // Too much!
return value;
}
// GOOD - Only log cache issues
function getFromCache(key) {
const value = cache.get(key);
// Only log misses for important keys
if (!value && key.startsWith('critical:')) {
logger.warn('Cache miss for critical key', { key });
}
return value;
}
Don't log:
- Every database query (log slow queries)
- Every cache hit (log misses for important data)
- Every function call (use profilers instead)
- Loop iterations
- Successful health checks
Full Objects
// BAD - Giant object in every log
logger.info('Processing order', { order: order }); // 500 fields
// GOOD - Only relevant fields
logger.info('Processing order', {
orderId: order.id,
userId: order.userId,
itemCount: order.items.length,
total: order.total
});
Logging entire objects:
- Bloats log storage
- Slows down logging
- Exposes sensitive data you didn't realize was there
- Makes logs hard to read
Request IDs: The Secret Weapon
const { v4: uuidv4 } = require('uuid');
// Add request ID to every request
app.use((req, res, next) => {
req.id = req.headers['x-request-id'] || uuidv4();
res.setHeader('X-Request-ID', req.id);
next();
});
// Include in every log
logger.info('Processing request', {
requestId: req.id,
path: req.path
});
logger.error('Database error', {
requestId: req.id,
error: err.message
});
Now you can trace a single request across:
- Multiple log entries
- Multiple services
- External API calls
- Database queries
Pro tip: Return the request ID in error responses:
app.use((err, req, res, next) => {
logger.error('Request failed', {
requestId: req.id,
error: err.message,
stack: err.stack
});
res.status(500).json({
error: 'Internal server error',
requestId: req.id // Users can give you this for debugging
});
});
Performance: Logging Is Not Free
I once had an app that spent 30% of CPU time on logging. Here's what kills performance:
Synchronous Logging
// BAD - Blocks the event loop
function syncLogToFile(message) {
fs.appendFileSync('app.log', message + '\n'); // Blocks!
}
// GOOD - Async logging
const logger = winston.createLogger({
transports: [
new winston.transports.File({
filename: 'app.log',
// Winston writes async by default
})
]
});
Excessive String Concatenation
// BAD - Concatenates even if log level is disabled
logger.debug('User data: ' + JSON.stringify(hugeObject));
// If debug is disabled, you still paid for JSON.stringify!
// GOOD - Lazy evaluation
if (logger.level === 'debug') {
logger.debug('User data', { user: hugeObject });
}
// BETTER - Logger handles this
logger.debug('User data', () => ({ user: hugeObject }));
Stack Trace Capture
// BAD - Captures stack trace for every log
logger.info('Request processed', {
stack: new Error().stack // Expensive!
});
// GOOD - Only capture for errors
logger.error('Request failed', {
error: err.message,
stack: err.stack // Already captured
});
My Performance Rules
- Keep logging under 5% of total CPU
- Buffer logs and flush async
- Sample high-frequency logs
- Disable debug logs in production
- Monitor logging overhead
Sampling: Handle High Volume
// Log 1% of successful requests, 100% of errors
app.use((req, res, next) => {
res.on('finish', () => {
const shouldLog = res.statusCode >= 400 || Math.random() < 0.01;
if (shouldLog) {
logger.info('HTTP request', {
method: req.method,
path: req.path,
status: res.statusCode,
duration: res.duration,
sampled: res.statusCode < 400 // Mark sampled logs
});
}
});
next();
});
This reduced our log volume by 95% while keeping visibility into problems.
Real-World Configurations
Node.js with Winston
const winston = require('winston');
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'api',
version: process.env.APP_VERSION,
environment: process.env.NODE_ENV
},
transports: [
// Console for local dev
new winston.transports.Console({
format: winston.format.combine(
winston.format.colorize(),
winston.format.simple()
)
}),
// File for production
new winston.transports.File({
filename: 'error.log',
level: 'error',
maxsize: 10485760, // 10MB
maxFiles: 5
})
]
});
module.exports = logger;
Python with structlog
import structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer()
],
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
logger = structlog.get_logger()
# Usage
logger.info("user_login", user_id=user.id, ip=request.ip)
logger.error("db_error", error=str(e), query=sql[:100])
Go with zap
package main
import "go.uber.org/zap"
func main() {
logger, _ := zap.NewProduction()
defer logger.Sync()
logger.Info("user_login",
zap.String("user_id", userID),
zap.String("ip", req.RemoteAddr),
)
logger.Error("db_error",
zap.Error(err),
zap.String("query", sql),
)
}
Log Aggregation: Making Logs Useful
Structured logs are pointless if you can't search them:
CloudWatch Logs
const CloudWatchTransport = require('winston-cloudwatch');
logger.add(new CloudWatchTransport({
logGroupName: '/aws/lambda/my-function',
logStreamName: () => {
const date = new Date().toISOString().split('T')[0];
return `${date}-${process.env.AWS_LAMBDA_LOG_STREAM_NAME}`;
},
awsRegion: 'us-east-1'
}));
Search queries:
# Find all errors for a user
fields @timestamp, @message, error
| filter userId = "usr_123" and level = "error"
| sort @timestamp desc
# Slow requests
fields @timestamp, method, path, duration
| filter duration > 1000
| stats avg(duration) by path
ELK Stack (Elasticsearch, Logstash, Kibana)
// Ship logs to Elasticsearch
const ElasticsearchTransport = require('winston-elasticsearch');
logger.add(new ElasticsearchTransport({
level: 'info',
clientOpts: {
node: 'http://localhost:9200'
},
index: 'logs'
}));
Datadog
const datadog = require('winston-datadog');
logger.add(new datadog({
apiKey: process.env.DATADOG_API_KEY,
hostname: 'api-server',
service: 'web-api',
ddsource: 'nodejs'
}));
Context: The Middleware Pattern
// Add context to all logs in a request
const asyncLocalStorage = new AsyncLocalStorage();
app.use((req, res, next) => {
const context = {
requestId: req.id,
userId: req.user?.id,
ip: req.ip,
path: req.path
};
asyncLocalStorage.run(context, () => next());
});
// Helper to get context
function getLogContext() {
return asyncLocalStorage.getStore() || {};
}
// Now every log automatically includes context
function logWithContext(level, message, extra) {
logger[level](message, {
...getLogContext(),
...extra
});
}
// Usage anywhere in request
logWithContext('info', 'Payment processed', { orderId: order.id });
// Automatically includes requestId, userId, ip, path
Alerts: When to Wake Someone Up
// Example alert rules
const ALERT_RULES = {
// Error rate above 1%
high_error_rate: {
query: 'status >= 500',
threshold: 0.01,
window: '5m',
severity: 'critical'
},
// Slow requests
slow_requests: {
query: 'duration > 5000',
threshold: 0.05, // 5% of requests
window: '10m',
severity: 'warning'
},
// Payment failures
payment_failures: {
query: 'event = "payment_failed"',
threshold: 10, // 10 in 5 minutes
window: '5m',
severity: 'critical'
}
};
Alert fatigue is real. I only alert on:
- Error rate above baseline
- External API failures
- Payment processing issues
- Database connection failures
- Disk/memory approaching limits
Don't alert on single errors or expected failures.
Common Mistakes
Mistake 1: Logging Passwords
// BAD
app.post('/login', (req, res) => {
logger.info('Login attempt', { body: req.body });
// Just logged the password!
});
// GOOD
app.post('/login', (req, res) => {
logger.info('Login attempt', {
email: req.body.email,
ip: req.ip
});
});
Mistake 2: Logging in Loops
// BAD - 10,000 log entries
for (const item of items) {
logger.info('Processing item', { id: item.id });
processItem(item);
}
// GOOD - One log entry
logger.info('Processing batch', { count: items.length });
for (const item of items) {
processItem(item);
}
logger.info('Batch complete', { count: items.length });
Mistake 3: No Error Context
// BAD
try {
await processPayment(order);
} catch (err) {
logger.error(err.message); // No context!
}
// GOOD
try {
await processPayment(order);
} catch (err) {
logger.error('Payment processing failed', {
error: err.message,
stack: err.stack,
orderId: order.id,
userId: order.userId,
amount: order.total,
paymentMethod: order.paymentMethod
});
}
Mistake 4: Logging Too Much in Production
// BAD - Debug logs in production
logger.debug('Cache lookup', { key }); // 10,000/sec
logger.debug('Database query', { sql }); // 5,000/sec
logger.debug('Processing item', { item }); // 50,000/sec
// GOOD - Info only in production
if (process.env.NODE_ENV === 'production') {
logger.level = 'info';
}
Mistake 5: No Log Rotation
# Without rotation, logs fill disk
-rw-r--r-- 1 node node 45G Jan 31 10:00 app.log
# With rotation
-rw-r--r-- 1 node node 100M Jan 31 10:00 app.log
-rw-r--r-- 1 node node 100M Jan 30 10:00 app.log.1
-rw-r--r-- 1 node node 100M Jan 29 10:00 app.log.2
Use logrotate or Winston's maxsize/maxFiles.
Security: Preventing Log Injection
// BAD - User input in logs
logger.info(`User ${req.body.username} logged in`);
// If username is: "admin\n{\"level\":\"error\",\"message\":\"System hacked\"}"
// Creates fake log entries!
// GOOD - Structured logging prevents injection
logger.info('User logged in', {
username: req.body.username // Safely escaped in JSON
});
Structured logging (JSON) prevents log injection attacks.
My Current Setup
After running production systems for years:
const winston = require('winston');
const asyncLocalStorage = new AsyncLocalStorage();
// Create logger
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: process.env.SERVICE_NAME,
version: process.env.VERSION,
env: process.env.NODE_ENV
},
transports: [
new winston.transports.Console(),
new winston.transports.File({
filename: 'error.log',
level: 'error',
maxsize: 10485760,
maxFiles: 5
})
]
});
// Request middleware
app.use((req, res, next) => {
req.id = req.headers['x-request-id'] || uuidv4();
res.setHeader('X-Request-ID', req.id);
const context = {
requestId: req.id,
userId: req.user?.id,
ip: req.ip
};
asyncLocalStorage.run(context, () => next());
});
// Log wrapper with context
const log = {
info: (message, meta = {}) => {
logger.info(message, { ...asyncLocalStorage.getStore(), ...meta });
},
warn: (message, meta = {}) => {
logger.warn(message, { ...asyncLocalStorage.getStore(), ...meta });
},
error: (message, meta = {}) => {
logger.error(message, { ...asyncLocalStorage.getStore(), ...meta });
}
};
module.exports = log;
What I log:
- HTTP requests (status, duration, path)
- Errors (with full context)
- Business events (signups, payments, etc.)
- External API calls (success/failure, duration)
- Slow operations (> 1 second)
What I don't log:
- Passwords or tokens
- Full request/response bodies
- Debug info in production
- High-frequency events (cache hits)
- PII without explicit need
The Bottom Line
Good logging makes debugging possible. Bad logging makes breaches inevitable.
Use structured logging. JSON format with proper libraries, not console.log.
Log with context. Request IDs, user IDs, relevant business data - not sensitive info.
Respect log levels. ERROR means wake someone up. INFO means normal operations. DEBUG stays off in production.
Never log secrets. Passwords, tokens, credit cards, API keys - assume logs are public.
Monitor performance. Logging shouldn't consume more than 5% of your resources.
I've debugged production issues blind because of missing logs. I've responded to security breaches because of exposed logs.
Set up proper logging from day one. The first production incident will prove it was worth it.