Database Optimization Techniques for High-Performance Applications
Parijat Anand
CTO at D2 Enterprises
Database performance can make or break your application. A slow database means slow page loads, frustrated users, and lost revenue. Yet many applications suffer from preventable performance issues due to poor query design, missing indexes, or suboptimal configurations. Let's explore proven techniques to optimize database performance.
Understanding Database Performance
Before optimizing, you need to understand what you're optimizing for:
- Query response time: How quickly individual queries execute
- Throughput: Number of queries processed per second
- Concurrency: Handling multiple simultaneous connections
- Resource utilization: CPU, memory, disk I/O, network
- Scalability: Performance as data volume grows
1. Indexing Strategies
Indexes are the most powerful tool for improving query performance. They allow the database to find data without scanning entire tables.
When to Create Indexes
- WHERE clauses: Columns frequently used in filtering
- JOIN conditions: Foreign key columns
- ORDER BY clauses: Columns used for sorting
- GROUP BY clauses: Columns used for grouping
- Unique constraints: Enforce uniqueness efficiently
Index Types
B-Tree Indexes (Default):
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);
Composite Indexes:
CREATE INDEX idx_users_status_created ON users(status, created_at);
-- Efficient for:
WHERE status = 'active' AND created_at > '2024-01-01'
WHERE status = 'active'
-- NOT efficient for:
WHERE created_at > '2024-01-01' -- Doesn't use first column
Partial Indexes:
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Smaller, faster for specific queries
SELECT * FROM users WHERE status = 'active' AND email = '[email protected]';
Full-Text Indexes:
CREATE FULLTEXT INDEX idx_products_search ON products(name, description);
SELECT * FROM products WHERE MATCH(name, description) AGAINST('laptop');
Index Best Practices
- Don't over-index: Each index slows down writes and uses storage
- Monitor index usage: Remove unused indexes
- Consider column order: Most selective columns first in composite indexes
- Index foreign keys: Essential for JOIN performance
- Use covering indexes: Include all columns needed by query
2. Query Optimization
Well-written queries can be orders of magnitude faster than poorly written ones.
Use EXPLAIN to Analyze Queries
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
GROUP BY u.id, u.name;
Avoid SELECT *
SELECT * FROM users WHERE id = 123;
-- Good: Only retrieve needed columns
SELECT id, name, email FROM users WHERE id = 123;
Optimize JOINs
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Use appropriate JOIN type
-- INNER JOIN: Only matching rows
-- LEFT JOIN: All left table rows + matches
-- Avoid CROSS JOIN unless intentional
Limit Result Sets
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;
-- Use pagination for large result sets
SELECT * FROM products
ORDER BY id
LIMIT 20 OFFSET 40;
Avoid N+1 Query Problem
SELECT * FROM users;
-- Then for each user:
SELECT * FROM orders WHERE user_id = ?;
-- Good: Single query with JOIN
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
Use EXISTS Instead of COUNT
SELECT * FROM users WHERE (SELECT COUNT(*) FROM orders WHERE user_id = users.id) > 0;
-- Good: Stops at first match
SELECT * FROM users WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = users.id);
3. Database Schema Design
Good schema design prevents performance problems before they start.
Normalization vs Denormalization
Normalization (reduce redundancy):
- Eliminates data duplication
- Ensures data consistency
- Requires JOINs for related data
- Good for write-heavy workloads
Denormalization (optimize reads):
- Stores redundant data
- Reduces JOINs
- Faster reads, slower writes
- Good for read-heavy workloads
Choose Appropriate Data Types
TINYINT -- 0 to 255 (1 byte)
SMALLINT -- -32,768 to 32,767 (2 bytes)
INT -- -2B to 2B (4 bytes)
BIGINT -- Very large numbers (8 bytes)
-- Use VARCHAR instead of CHAR for variable-length strings
VARCHAR(255) -- Only uses needed space
CHAR(255) -- Always uses 255 bytes
-- Use ENUM for fixed sets
status ENUM('pending', 'active', 'inactive')
Partition Large Tables
CREATE TABLE orders (
id BIGINT,
user_id INT,
created_at DATE,
...
) PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
4. Caching Strategies
Reduce database load by caching frequently accessed data.
Application-Level Caching
- Redis/Memcached: Cache query results, computed values
- Cache invalidation: Update or remove when data changes
- TTL (Time To Live): Automatic expiration for stale data
Query Result Caching
-- Use application-level caching instead
-- Example with Redis:
key = "user:123:profile"
cached = redis.get(key)
if cached:
return cached
else:
data = db.query("SELECT * FROM users WHERE id = 123")
redis.setex(key, 3600, data) # Cache for 1 hour
return data
Materialized Views
CREATE MATERIALIZED VIEW user_order_stats AS
SELECT
user_id,
COUNT(*) as total_orders,
SUM(total_amount) as total_spent,
MAX(created_at) as last_order_date
FROM orders
GROUP BY user_id;
-- Refresh periodically
REFRESH MATERIALIZED VIEW user_order_stats;
5. Connection Pooling
Reuse database connections instead of creating new ones for each request.
Connection Pool Configuration
const pool = mysql.createPool({
host: 'localhost',
user: 'app_user',
password: 'password',
database: 'myapp',
connectionLimit: 10, // Max connections
queueLimit: 0, // Unlimited queue
waitForConnections: true, // Wait if all busy
acquireTimeout: 10000 // 10 second timeout
});
Pool Sizing Guidelines
- Start small: 10-20 connections often sufficient
- Monitor usage: Increase if connections frequently maxed out
- Consider database limits: Don't exceed max_connections
- Multiple app servers: Total connections = pool_size × servers
6. Database Configuration Tuning
Optimize database server settings for your workload.
MySQL/MariaDB Key Settings
innodb_buffer_pool_size = 8G
# Log file size (larger = better write performance)
innodb_log_file_size = 512M
# Connection limits
max_connections = 200
# Query cache (deprecated in MySQL 8.0)
query_cache_size = 0
# Temporary tables
tmp_table_size = 64M
max_heap_table_size = 64M
PostgreSQL Key Settings
shared_buffers = 4GB
# Effective cache size (50-75% of RAM)
effective_cache_size = 12GB
# Work memory per operation
work_mem = 64MB
# Maintenance work memory
maintenance_work_mem = 512MB
# WAL settings
wal_buffers = 16MB
checkpoint_completion_target = 0.9
7. Monitoring and Profiling
You can't optimize what you don't measure. Implement comprehensive monitoring.
Key Metrics to Monitor
- Slow query log: Queries exceeding threshold
- Query execution time: Average, P95, P99
- Connection count: Active, idle, waiting
- Cache hit ratio: Buffer pool effectiveness
- Disk I/O: Reads, writes, latency
- Lock contention: Deadlocks, wait times
- Replication lag: For read replicas
Enable Slow Query Log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries > 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';
-- PostgreSQL
ALTER SYSTEM SET log_min_duration_statement = 1000; -- 1 second
SELECT pg_reload_conf();
Monitoring Tools
- MySQL: MySQL Workbench, Percona Monitoring and Management
- PostgreSQL: pgAdmin, pg_stat_statements
- General: Datadog, New Relic, Prometheus + Grafana
8. Read Replicas and Sharding
Scale beyond single-server limits with replication and sharding.
Read Replicas
- Primary: Handles all writes
- Replicas: Handle read queries
- Replication lag: Slight delay in replica data
- Use cases: Analytics, reporting, read-heavy apps
Sharding
- Horizontal partitioning: Split data across multiple databases
- Shard key: Determines which shard holds data (e.g., user_id)
- Challenges: Cross-shard queries, rebalancing
- When to shard: Single database can't handle load
9. Batch Operations
Process multiple records in single operations instead of loops.
Bulk Inserts
INSERT INTO users (name, email) VALUES ('John', '[email protected]');
INSERT INTO users (name, email) VALUES ('Jane', '[email protected]');
-- ... 1000 more times
-- Good: Single bulk insert
INSERT INTO users (name, email) VALUES
('John', '[email protected]'),
('Jane', '[email protected]'),
... -- All 1000 rows
('Bob', '[email protected]');
Batch Updates
UPDATE users SET status = 'active' WHERE id = 1;
UPDATE users SET status = 'active' WHERE id = 2;
-- ...
-- Good: Single update with IN clause
UPDATE users SET status = 'active' WHERE id IN (1, 2, 3, ..., 1000);
10. Regular Maintenance
Keep your database healthy with regular maintenance tasks.
Maintenance Checklist
- Analyze tables: Update statistics for query planner
- Optimize tables: Defragment and reclaim space
- Update indexes: Rebuild fragmented indexes
- Purge old data: Archive or delete obsolete records
- Monitor growth: Plan for storage expansion
- Review slow queries: Optimize problematic queries
- Test backups: Ensure recovery procedures work
ANALYZE TABLE users;
OPTIMIZE TABLE users;
-- PostgreSQL maintenance
VACUUM ANALYZE users;
REINDEX TABLE users;
Common Performance Pitfalls
- Missing indexes: Most common cause of slow queries
- Over-indexing: Slows down writes unnecessarily
- N+1 queries: Fetching related data in loops
- Large transactions: Holding locks too long
- Unbounded queries: No LIMIT on large tables
- Ignoring monitoring: Not knowing where problems are
- Premature optimization: Optimizing before measuring
Conclusion
Database optimization is an ongoing process, not a one-time task. Start with proper indexing and query optimization, monitor performance continuously, and scale strategically as your application grows.
Remember the 80/20 rule: often 20% of your queries cause 80% of performance issues. Focus on identifying and optimizing those critical queries first.
At D2 Enterprises, we've optimized databases handling billions of records and millions of queries per day. Whether you're experiencing performance issues or planning for scale, these techniques provide a solid foundation for building high-performance applications.
About Parijat Anand
Parijat is the Chief Technology Officer at D2 Enterprises. Our database specialists have optimized systems handling billions of records across MySQL, PostgreSQL, MongoDB, and other platforms, combining deep technical knowledge with practical, results-driven solutions.
View full profile →